Crossover of Audit and Evaluation Practices: Challenges and Opportunities 1003021026, 9781003021025

"Crossover of Audit and Evaluation Practices brings together academic analysis with insights from practitioners to

517 83 3MB

English Pages 249 Year 2020

Polecaj historie

APEC: Challenges and Opportunities 9789814379991

Poltical and academic interest in economic relations among APEC economies and in the APEC process has been gaining momen

177 68 7MB Read more

Black Women's Health: Challenges and Opportunities : Challenges and Opportunities [1 ed.] 9781616684204, 9781608764532

Women have always played a unique role in society. Seen as the nucleus of the family, textbooks about women have focused

230 56 4MB Read more

Challenges, Opportunities and Realities 9789811619199, 9789811619182

805 88 17MB Read more

Responsible Business Operations: Challenges and Opportunities 9783030519575

738 145 13MB Read more

VANET: Challenges and Opportunities 9780367743093, 9780367743123, 9781003157069

VANET (vehicular ad hoc network) is a subgroup of MANET (mobile ad hoc network). It enables communication among vehicles

767 31 3MB Read more

Decent Work: Opportunities and Challenges 180117587X, 9781801175876

Exploring contemporary challenges and opportunities for the realisation of Decent Work, this edited collection reviews t

335 67 2MB Read more

Big Data : Opportunities and challenges 9781780172620

Despite the current hype around big data, there is no denying that its potential to benefit organisations, businesses an

222 49 702KB Read more

Cognitive Digital Twins for Smart Lifecycle Management of Built Environment and Infrastructure: Challenges, Opportunities and Practices 103213626X, 9781032136264

This book provides knowledge into Cognitive Digital Twins for smart lifecycle management of built environment and infras

300 91 12MB Read more

Cities of the Future: Challenges and Opportunities 9783031154607, 9783031154591, 3031154606

149 33 62MB Read more

The Rise of Transtexts: Challenges and Opportunities 0367874407, 9780367874407

This volume builds on previous notions of transmedia practices to develop the concept of transtexts, in order to account

227 46 698KB Read more

Crossover of Audit and Evaluation Practices: Challenges and Opportunities
1003021026, 9781003021025

Author / Uploaded
Maria Barrados
Jeremy Lonsdale

Table of contents :
Cover
Half Title
Series Page
Title Page
Copyright Page
Table of Contents
List of Illustrations
List of Boxes
Preface
Foreword
Chapter 1: Introduction
Audit and Evaluation
Definitions
Performance Audit and Program Evaluation
National Audit Offices as Leaders in Practice
Internal Audit and Program Evaluation
Performance Information – A Common Preoccupation
Themes and Structure of the Book
References
Part I: The Basics
Chapter 2: Performance Audit Defined
Introduction
How the Practice Is Defined
INTOSAI Standards Framework for Performance Audit
The Performance Audit Process
The Management of Audit Risk
Concluding Remarks
References
Chapter 3: Internal Audit Defined
Introduction
The Internal Audit Profession
International Standards of Practice for Internal Audit
Internal Auditor Skills and Experience
Differences Between Internal and External Audit
Conclusion
References
Chapter 4: Defining Evaluation
A Brief Note on the History of Evaluation
Evaluation as a Function and an Institutionalized Practice
Evaluation Purpose, Scope, and Approach
Quality in Evaluation
Concluding Remarks
References
Chapter 5: The Practices of Audit and Evaluation
Introduction
What is Defining and Unique?
What is Shared?
Conclusion
Note
References
Appendix A
Part II: Addressing Challenges in Practice
Chapter 6: Ethics in Audits and Evaluation: In Search of Virtuous Practices
Introduction
The Sources of Ethical Dilemmas in Practice
Resolving Ethical Dilemmas: Insights from Philosophical Perspectives and Research
Key Questions and Answers for Ethical Practice
Creating the Predispositions for Ethical Behavior (Prevention)
Conclusion
References
Chapter 7: Managing Reputational Risk
Introduction
An Analytic Framework
Conclusions
References
Chapter 8: Framing Recommendations
Introduction
The What and Why of Recommendations
Factors Affecting Effectiveness
Framework for Recommendations
Australian National Audit Office Case Study
Case Study Assessment of the Framework
Relevance for Internal Audit and Evaluation
Conclusion
References
Chapter 9: Auditing in Changing Times: The UK National Audit Office’s Response to a Turbulent Environment
State Audit Institutions and Change
The National Audit Office and its Environment
Changing Environment in the United Kingdom
What are the Implications for the NAO and its Performance Audit?
Responses to the Changing Environment
Lessons for Evaluation and Internal Audit
References
Chapter 10: Understanding the Practice of Embedded Evaluation: Opportunities and Challenges
Embedded Evaluation Defined
Challenges and Opportunities
Two Short Case Studies
Conclusion
References
Part III: Practices Working Together
Chapter 11: Conducting Evaluation in An Audit Agency
Differences Between Performance Audits and Program Evaluations
History of the GAO’s Adoption of Evaluation
GAO Policy and Procedures Support Evaluations
Conclusion
References
Chapter 12: Two Sides of the Same Coin: The UNESCO Example
Different Models for Location of Oversight Functions Exist
Oversight Within the UNESCO Context
Overall Lessons
Two Coins or Two-Sides-Of-One-Coin?
Notes
References
Chapter 13: Lessons Learned from the Assessment of UNDP’s Institutional Effectiveness Jointly Conducted by the Independent Evaluation Office and the Office of Audit and Investigation of UNDP
Introduction
The Joint Undertaking
Naming the “Beast,” Decision Rights, Communication Channels, and Protocols
Developing Common Understandings to Assess Evaluability and Define Key Questions
Data Collection Methods, Triangulation, and Agreeing on What Constitutes Credible Evidence
Report Writing and Dealing with Sensitive Language to Ensure Independence, Credibility, and Utility
The Fear Factor Versus the Enhanced Credibility Ripples
Concluding Remarks for Further Consideration
References
Chapter 14: Reflections on Opportunities and Challenges in Evaluation in the Development Banks
Introduction
The Role of MDBs
Audit Versus Evaluation
Audit, Evaluation, Lesson Learning, and Results Reporting at the IDB
Personal Reflections on the Work of the Operations Evaluation Office (OEO)
Final Thoughts
Notes
References
Chapter 15: Conclusions
Influencing the Scope for Crossover
Common Considerations
References
List of Contributors
Index

Citation preview

Crossover of Audit and Evaluation Practices

Crossover of Audit and Evaluation Practices brings together academic analysis with insights from practitioners to discuss the potential for collaboration in audit and evaluation practices between three professional disciplines. Clearly written and thoughtfully organized, this volume is structured in three parts to deal with theory, practice issues and how the practices have worked together. • • •

Part I provides definitions of performance audit, internal audit, and program evaluation. Part II addresses several challenges that professionals face in applying these standards and principles. Part III contains examples of organizational collaboration between the practices, how they have worked together and the lessons that were learned from that experience. Specific cases from the Government Accountability Office, and UNESCO, UNDP, and Inter-Americas Development Bank illustrate what has worked or not and suggest reasons why.

Crossover of Audit and Evaluation Practices offers even the most skilled and experienced professional insight on how to bridge some of the divides. It will help generate a better understanding of the activities and services that are either imposed on them or are freely available and help to stimulate their optimal use. Maria Barrados is currently Executive-in-Residence at the Sprott School of Business, Carleton University. Her Canadian Government career included working as a government program evaluator, as a performance auditor and audit executive at the Office of the Auditor General, and as President of the Public Service Commission that included responsibility for internal audit and program evaluation. Jeremy Lonsdale is Director of Defence Value For Money Audit, at the UK’s National Audit Office (NAO). He has held a number of senior posts at the NAO. Between 2014 and 2016 he was a Senior Research Leader at RAND Europe in Cambridge. His publications include “Performance auditing: Contributing to accountability in Democratic Government” (2011) and “Making Accountability Work: Dilemmas for Evaluation and for Audit” (2007).

Comparative Policy Evaluation Edited by Ray C. Rist

The Comparative Policy Evaluation series is an interdisciplinary and internationally focused set of books that embodies within it a strong emphasis on comparative analyses of governance issues – drawing from all continents and many different nation states. The lens through which these policy initiatives are viewed and reviewed is that of evaluation. These evaluation assessments are done mainly from the perspectives of sociology, anthropology, economics, policy science, auditing, law, and human rights. The books also provide a strong longitudinal perspective on the evolution of the policy issues being analyzed.

Crossover of Audit and Evaluation Practices Challenges and Opportunities

Edited by Maria Barrados and Jeremy Lonsdale

First published 2020 by Routledge 52 Vanderbilt Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2020 Taylor & Francis The right of Maria Barrados and Jeremy Lonsdale to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this title has been requested ISBN: 978-0-367-89770-3 (hbk) ISBN: 978-1-003-02102-5 (ebk) Typeset in Times New Roman by Wearset Ltd, Boldon, Tyne and Wear

Contents

List of Illustrationsx List of Boxesxi Prefacexii Forewordxiv SYLVAIN RICARD

1 Introduction

1

MARIA BARRADOS AND JEREMY LONSDALE

Audit and Evaluation 1 Definitions 3 Performance Audit and Program Evaluation 5 National Audit Offices as Leaders in Practice 7 Internal Audit and Program Evaluation 9 Performance Information – A Common Preoccupation 9 Themes and Structure of the Book 10 References 11 PART I

The Basics

13

2 Performance Audit Defined

15

BASIA G. RUTA

Introduction 15 How the Practice Is Defined 16 INTOSAI Standards Framework for Performance Audit 20 The Performance Audit Process 23 The Management of Audit Risk 26 Concluding Remarks 27 References 28

vi Contents 3 Internal Audit Defined

31

DAVID RATTRAY

Introduction 31 The Internal Audit Profession 32 International Standards of Practice for Internal Audit 33 Internal Auditor Skills and Experience 36 Differences Between Internal and External Audit 38 Conclusion 40 References 40 4 Defining Evaluation

41

JOS VAESSEN AND MAURYA WEST MEIERS

A Brief Note on the History of Evaluation 41 Evaluation as a Function and an Institutionalized Practice 45 Evaluation Purpose, Scope, and Approach 47 Quality in Evaluation 48 Concluding Remarks 51 References 51 5 The Practices of Audit and Evaluation: Similarities and Differences

54

MARIA BARRADOS

Introduction 54 What is Defining and Unique? 55 What is Shared? 59 Conclusion 64 Notes 65 References 65 Appendix A 66 PART II

Addressing Challenges in Practice

69

6 Ethics in Audits and Evaluation: In Search of Virtuous Practices

71

LISA BIRCH, STEVE JACOB, AND ALEX MILLER-PELLETIER

Introduction 71 The Sources of Ethical Dilemmas in Practice 71 Resolving Ethical Dilemmas: Insights from a Philosophical Perspective and Research 74

Contents vii Key Questions and Answers for Ethical Practice 76 Creating the Predispositions for Ethical Behavior (Prevention) 80 Conclusion 82 References 83 7 Managing Reputational Risk

87

RICHARD BOYLE AND PETER WILKINS

Introduction 87 An Analytic Framework 88 Conclusions 98 References 99 8 Framing Recommendations

101

PETER WILKINS

Introduction 101 The What and Why of Recommendations 102 Factors Affecting Effectiveness 103 Framework for Recommendations 104 Australian National Audit Office Case Study 110 Case Study Assessment of the Framework 110 Relevance for Internal Audit and Evaluation 113 Conclusion 115 References 116 9 A uditing in Changing Times: The UK National Audit Office’s Response to a Turbulent Environment

118

JEREMY LONSDALE

State Audit Institutions and Change 118 The National Audit Office and its Environment 119 Changing Environment in the United Kingdom 120 What are the Implications for the NAO and its Performance Audit? 121 Responses to the Changing Environment 123 Lessons for Evaluation and Internal Audit 128 References 130 10 U nderstanding the Practice of Embedded Evaluation: Opportunities and Challenges CHRISTIAN VAN STOLK AND TOM LING

Embedded Evaluation Defined 133

132

viii Contents Challenges and Opportunities 137 Two Short Case Studies 140 Conclusion 146 References 147 PART III

Practices Working Together

149

11 Conducting Evaluation in an Audit Agency

151

STEPHANIE SHIPMAN

Differences Between Performance Audits and Program Evaluations 151 History of the GAO’s Adoption of Evaluation 153 GAO Policy and Procedures Support Evaluations 154 Conclusion 160 References 160 12 Two Sides of the Same Coin: The UNESCO Example

161

SUSANNE FRUEH

Different Models for Location of Oversight Functions Exist 162 Oversight Within the UNESCO Context 164 Overall Lessons 178 Two Coins or Two-Sides-of-One-Coin? 180 Notes 181 References 181 13 Lessons Learned from the Assessment of UNDP’s Institutional Effectiveness Jointly Conducted by the Independent Evaluation Office and the Office of Audit and Investigation of UNDP INDRAN NAIDOO AND ANA SOARES

Introduction 182 The Joint Undertaking 183 Naming the “Beast,” Decision Rights, Communication Channels, and Protocols 184 Developing Common Understandings to Assess Evaluability and Define Key Questions 185 Data Collection Methods, Triangulation, and Agreeing on What Constitutes Credible Evidence 186

182

Contents ix Report Writing and Dealing with Sensitive Language to Ensure Independence, Credibility, and Utility 188 The Fear Factor Versus the Enhanced Credibility Ripples 190 Concluding Remarks for Further Consideration 191 References 192 14 R eflections on Opportunities and Challenges in Evaluation in the Development Banks

194

ARNE PAULSON

Introduction 194 The Role of MDBs 194 Audit Versus Evaluation 196 Audit, Evaluation, Lesson Learning, and Results Reporting at the IDB 199 Personal Reflections on the Work of the Operations Evaluation Office (OEO) 202 Final Thoughts 205 Notes 206 References 206 15 Conclusion

208

JEREMY LONSDALE AND MARIA BARRADOS

Influencing the Scope for Crossover 209 Common Considerations 214 References 219 List of Contributors Index

220 225

Illustrations

Figures 5.1 Practice Traditions 5.2 Professionalization 5.3 Reporting Relationships 5.4 Interrelationships Between Performance Audit and Program Evaluation 12.1 The Two Functions in UNESCO’s Context 12.2 Evaluation and Audit – Combined Oversight Functions in UNESCO

56 57 59 61 164 167

Tables 6.1 Three Ethical Perspectives 6.2 A Continuum of Ethical Dilemmas and Perspectives for Solutions 7.1 Analytical Framework 8.1 Summary of Some Practical Guidance on Recommendations Offered by Three Organizations 8.2 Components of the Three Domains of Content, Process, and Context 12.1 The Positioning of Evaluation Offices in The UN System 12.2 Types of Collaboration in Four Case Studies 12.3 Case Study 1 – Joint Study Methodology 12.4 Case Study 2 – Parallel Study Methodology 12.5 Evaluations and Audit Standard-Setting Work of the Culture Sector – Scope and Coverage 12.6 Case Study 4 – Integrated Study Methodology

74 75 89 104 105 163 169 170 172 173 176

Boxes

2.1 International Standard Setting Organizations 2.2 INTOSAI’s Framework of Professional Standards 2.3 INTOSAI’s Code of Ethics 3.1 Internal Audit Standards 2017 3.2 Performance Audit Standards 2017 3.3 Proficiency and Due Professional Care 3.4 Internal Audit at National Defence and the Canadian Armed Forces in Canada 9.1 Significant Product Developments 12.1 Professional Practice Areas

19 20 22 34 34 36 37 126 166

Preface

One of the most striking features of the International Evaluation Research Group – Inteval – now well into its fourth decade, is the way in which it has successfully brought together a rich and diverse range of people to share their distinctive perspectives on many forms of evaluation activity. This book is a further product of this eclectic mixture of practitioners and academics, and another illustration of the value of collaboration across the disciplines. The purpose of this volume is to examine the crossover between the approaches used in evaluation, performance audit and internal audit, three scrutiny activities which have expanded in scale and complexity in the last few decades. Our perspective is not that they are the same, or even necessarily similar. Indeed, our findings make it very clear that they have many distinctive characteristics that mark them apart. Evaluation in all its rich diversity is carried out in a wide range of ways by units inside and outside of public bodies, within consultancies and research institutes, and academia. External performance audit is often undertaken by independent audit institutions under statutory powers, with a parliamentary audience for its outputs. Internal audit, as its name suggests, is carried out from within private and public bodies, with boards and management as the recipients of its findings and recommendations. Nevertheless, the starting point for our book is that there is sufficient common ground between these three activities – in their basic purposes, their search for solutions to problems, and their interest in gathering and analyzing evidence to reach conclusions – for there to be considerable value in practitioners understanding more about each other’s work. Another contention which helps structure the book is our belief that there are a series of considerations which each of the forms of scrutiny activity must take into account – such as the ethical aspects of the work, the need to consider and manage risks, the framing of recommendations or the need to take account of the changing external environment. Thus, our argument is that there is much that each can learn from the others. To substantiate this assertion, in Part II the book explores these challenges as faced in different settings. Part III is then constructed around a series of case studies which illustrate how crossover – activities straddling or integrating the different disciplines – have been developed to meet particular needs. Such efforts are not without their challenges – cultural, organizational, and methodological to name

Preface xiii but three – but together we believe our evidence gives us grounds for confidence that such collaboration does and has made sense. In introducing the book we want to take the opportunity to thank our authors who have contributed their experience and expertise to the book. Some authors are long-time members of Inteval and contributors to many past volumes in the Comparative Policy Evaluation series. Others are making their first, very welcome, contributions to the group’s canon, and in doing so, further extending the breadth of knowledge upon which the series has drawn. We would also like to thank other colleagues of the group, Jan-Eric Furubo in particular, who have provided comments and suggestions as the volume has progressed. All have shared their own experiences and knowledge, and worked with us to develop the themes and build the links between the different contributions. Many, including the editors, have experience of “crossing over” into other territories, where they have had to learn new ways of working, but have also had the opportunity to make the connections across the disciplines which have informed our thinking. Finally, the editors would also like to thank Ray Rist as editor of the series who has, as ever, guided, encouraged and supported us. Without his wise counsel and endless enthusiasm, the experience of preparing this volume would have been harder and much less enjoyable. Jeremy Lonsdale and Maria Barrados

Foreword

Arguably, audit, and evaluation serve different purposes: they are targeting different audiences and stakeholders, and have different objectives. Whereas audit (whether internal or external) aims to provide assurance that activities undertaken by the auditee are meeting established criteria or expectations, evaluation aims to inform stakeholders on whether programs are achieving the policy objectives they were created to achieve. That said, as this book intends to highlight, there are overlaps between these two activities, notably in the skillsets of the evaluators and auditors, as well as in methodologies. In some cases, audit/ evaluation objectives and findings may also coincide. In the final chapters of the book, examples of collaboration and co-location are discussed, highlighting the opportunities and challenges that can be created by these relationships between audit and evaluation. While evaluators may benefit from the stricter independence rules that apply to auditors, auditors may gain further insight through the evaluators’ broader stakeholder engagement methods. Further, collaborative audit and evaluation may contribute to strengthening professional skills and methodological toolboxes. There is also something to be said for eliminating duplication, and increasing effectiveness and efficiency of these exercises by leveraging the collective knowledge and tapping into different perspectives, rather than working separately without benefiting from these additional insights. And ultimately, the resulting cohesiveness could contribute to further enhancing the credibility and reliability of the findings in the eyes of the stakeholders. There exists great complementarity between these two practices, but there also exist gaps that would need to be bridged, notably in establishing common expectations as to what constitutes quality evidence and adequate data-gathering methodologies, as well as in report writing – where differences can occur. Collaboration and consultation could contribute to working through these gaps and establishing common practices. Collaboration between auditors and evaluators not only has the potential of enhancing the results of their work, but could also contribute to increasing efficiency and cost-effectiveness of these activities. This book is an interesting exposé of opportunities and challenges that emerge when exploring increased collaboration between auditors and evaluators. Sylvain Ricard, CPA, CA Interim Auditor General Office of the Auditor General of Canada

1 Introduction Maria Barrados and Jeremy Lonsdale

Audit and Evaluation Audit and evaluation are essential instruments for accountability. They help explain what organizations and individuals have done, how well they have done it, and what has been achieved as a result. Both support oversight by managers, boards, and legislatures so that these parties can be provided with information on whether or not actions have, for example, been carried out correctly, with the desired results, and that value for money has been achieved. There are numerous other mechanisms available for rendering similar accounts including inquiries, reviews, investigations, monitoring systems, and supervision associated with a broad array of performance management and measurement techniques (Lonsdale, Wilkins, & Ling, 2011). However, audit and evaluation are of particular interest because of the professionalization that has taken place in recent decades, the widely noticed expansion of such practice (Power, 1997), and the institutionalized approaches that have been applied to examining questions of performance. Performance audit, internal audit and evaluation are distinct areas of practice most often located in separate units or organizations within governments around the world. Unlike traditional financial audit, which has its own specific focus on the accuracy of financial statements, there can be a crossover between internal audit, performance audit and evaluation in the programs and topics that are examined and the methodologies used. There can also be transfers of professional practitioners between the different activities, most often from evaluation into audit, but also in other directions. This book examines the potential for crossover between the three professional groups and their work, as well as related incentives and potential barriers. By “crossover” we mean the sharing or exchange of approaches between the different practices which help in creating synergies or common understanding between practitioners from different backgrounds. The focus is on the public sector, but many of the observations, particularly in relation to internal audit, apply more generally, including to the private sector. In addition, in many countries there is an increasing blurring of the boundary between public and private sector provision, with many public services now

2 M. Barrados and J. Lonsdale delivered by private companies. As a result, many such companies are finding they are increasingly subject to the scrutiny expectations traditionally associated with public bodies. This makes the subject matter and concepts discussed in this book of relevance to people in a wide range of settings. Audit and evaluation are distinct activities that operate under their own statutes, policies, conventions, and professional standards. However, they do not operate in silos from each other. They have much in common, including principles of practice, methodologies, areas of work, and challenges. At the same time, however, there are significant differences. These include levels of formality and professionalization, rights of access to information, the extent to which their work is designed to be publicized and becomes public, and the extent to which those carrying out the work can expect to be able to secure a response to their recommendations. The visibility of the different types of work varies. There is often a greater public awareness of performance audit results than other, equally important, results from evaluation or internal audit. This is partly a reflection of the purpose of performance audit, which is concerned with public accountability and transparency, which is often far less the case for internal audit or evaluation. When the head of a national audit office – the statutory body that undertakes much of this work – issues a performance audit report, there is often media attention and public parliamentary or legislative scrutiny. In comparison, internal audit reports and evaluations are more often internal tools instigated by heads of organizations or by their program managers to provide advice and guidance on program performance. What these reports provide, and how they are produced, are important for recipients, who have them in their management tool kit to improve performance. Understanding the differences and similarities between the work and the reports they generate matters to the managers and users of the reports, as well as to audit and evaluation practitioners. These practitioners may be looking to achieve efficiencies and improvements in the ways they do their work by relying on, or collaborating with one another, which in turn has the potential to make for more insightful and less burdensome analysis. With a better understanding of what others are doing and what methods have worked elsewhere, opportunities exist for practitioners of all kinds to learn from each other and to strengthen each practice. Improved understanding of the different practices also provides opportunities for professionals to move between them more easily. Against this background, the purpose of this book is four-fold. First, it is to examine what the differences and similarities are of current practices with a view to gaining a better understanding of each of them. Second, to examine the practices and the opportunities they present for crossover and collaboration. Third, to examine the challenges to practices in the current environment of financially constrained public services. Fourth, to examine organizational initiatives that have been taken to enhance the opportunity for crossover or greater collaboration. For practitioners, the book is designed to provide insight on how to bridge some of the divides, while for users of the products, it will help generate a better

Introduction 3 understanding of the activities and services that are either imposed on them or are freely available, and also help to stimulate their optimal use. Specific cases will be included to illustrate what has worked or not and suggest reasons why. This introductory chapter provides the context for the remaining chapters in the volume by laying out some definitions and concepts, some of the major players, themes, and structure of the Book.

Definitions The Oxford dictionary defines audit as: An official inspection of an organization’s accounts, typically by an independent body. 1.1 A systematic review or assessment of something. The broad definition includes the audit of financial statements, performance audit (sometimes called value for money audit), and internal audit. In this book, we include in our analysis performance audit and internal audit, the two forms of audit that have most in common with evaluation in that they are designed to focus on performance gaps, and identify ways of rectifying or closing those gaps by analyzing available evidence and making recommendations for improvement. Authoritative definitions of each activity are as follows: Performance audit refers to an independent, objective, and reliable examination of whether government undertakings, systems, operations, programs, activities or organizations are operating in accordance with the principles of economy, efficiency, and/or effectiveness and whether there is room for improvement. (International Organization of Supreme Audit Institutions (INTOSAI), 2016b, p. 2). Internal Audit is an independent, objective assurance and consulting activity designed to add value and improve an organization’s operations. It helps an organization accomplish its objectives by bringing a systematic, disciplined approach to evaluate and improve the effectiveness of risk management, control, and governance processes (Institute of Internal Auditors Research Foundation, 2017). Program Evaluation refers to the process of determining the worth or significance of an activity, policy or program. [It is] as systematic and objective as possible, of a planned, ongoing, or completed intervention (Organisation for Economic Co-Operation and Development (OECD), 2010). These are generally-accepted definitions, each drawn from well-informed practitioner sources that carry weight in their fields. Those for performance audit and internal audit are from international institutions, reflecting a broad consensus built over recent decades, often through formal consultation and research processes. They may be subject to local interpretation but are widely recognized. Both types of audit also have international, national, and regional professional

4 M. Barrados and J. Lonsdale institutes, which represent and inform practitioners, and often further elaborate on the standards and practice by producing guidance and operating protocols. The program evaluation definition does not have the same broad consensus. As Morra Imas and Rist (2009, p. 8) point out, there are many different definitions that reflect different purposes or contexts, with the result that the practice of evaluation is not as prescribed or limited as audit. The absence of a consensus by an international body of evaluation practitioners is a significant difference between audit and evaluation. A further distinction between the three practices relates to the defined standards of practice and how they are regulated. All three have such standards but, as with the definition of the activities, there is much more consensus for audit than for evaluation. For performance audit, there is an agreed framework of international standards within which regional and individual organizations, mandated to carry out performance audits, have further developed standards and guidance. Internal audit has a single international set of standards and guidance that also serve as a national set of standards and guidance. Evaluation, on the other hand, has multiple standards that include those drawn from different disciplines. This is further elaborated in Chapters 2 to 5 in Part I. There is no certification for program evaluation or program evaluators, while audit has different forms of certification. Auditors who are qualified accountants (certified, chartered or management accountants, for example, depending on which institute’s qualification has been followed) are required to meet institutional standards, as well as audit standards, and are subject to discipline or loss of their designation if they fail to comply. Performance audit organizations that adhere to international and regional standards do not provide certification specifically for individual performance auditors, some of whom may be designated accountants but others could be evaluators or from other disciplines. Most will instead require completion of specified training or evidence of specialist knowledge as well as on-the-job experience. Further, external audit organizations commit to adhering to agreed performance audit standards and require adherence by each of their performance audit practitioners. Internal audit has a voluntary certification program available that internal auditors are encouraged to obtain, in addition to following additional staff training. Both forms of audit are subject to forms of peer review or external scrutiny to help identify areas for improving their practice. A challenge for the practice of performance audit is to ensure that the multidisciplinary evaluators who participate in it understand and can apply the performance audit standards. These standards are more procedural and prescriptive about the components of a performance audit than the requirements for systematic inquiry for evaluation, and are supplemented with much more practitioner guidance and process, developed to meet the needs of the regime within which the Audit Office operates. Some of this may seem alien to those not used to it, as well as overly bureaucratic and procedural, but is justified by audit bodies by the very public nature of their work and the expectations of transparency about how they conduct their business.

Introduction 5 Performance audit, internal audit and evaluation all, in their different ways, support accountability, with challenges for both in answering the increasingly complex questions about the value of public policies and programs in democratic countries (Bemelmans-Videc, Lonsdale, & Perrin, 2007). There are important distinctions to be made, however, in how they are positioned with respect to accountability relationships. Performance auditors in national audit institutions serve an existing accountability relationship between government and their legislature, which is often long-established, highly formalized and viewed by some as unnecessarily punitive (LeClerc et al., 1996, p. 224). This is less the case for internal audit and evaluation in government since they work inside public institutions, usually reporting to a board, committee or the head of the agency or department. Internal auditors in the private sector serve the accountability relationship between management and the governing board of governors. Not all of this work is publicly available, and as a result the consequences of the findings – for individuals and organizations – are likely to be worked out within institutions, rather than being visible, for example, via public hearings or press briefings as in the case of performance audit. Even within the definition and structural differences, there are commonalities between all or some of these activities. Performance audit and internal audit, for example, have a common root in accounting and in the formal rendering of an account for the allocation and utilization of public resources. Internal audit and program evaluation are at times co-managed or co-located in government organizations since they both support the head of the organization, unlike performance audit. Performance audit and program evaluation both deal with issues of program or policy effectiveness. In general, all three activities aim to translate their scrutiny and analysis of evidence presented on performance into practical recommendation for change – and hopefully improvement – within the activities or bodies examined, and possibly within wider categories of organizations, where the identified lessons may have similar resonance.

Performance Audit and Program Evaluation Over the past 30 years there has been a marked influence between the practice of evaluation and performance audit. Performance audit has developed in the bodies responsible for financial accounting and audit to answer the questions of economy, efficiency, and effectiveness of government expenditure. Performance audit’s development has been influenced by accounting and evaluation, and to a lesser extent other forms of inquiry such as operational research. Evaluation, on the other hand, has its roots in applied social science and research. There has, at times, been disagreement on how close they are or could potentially be (Wisler, 1996). The debate in Wisler (1996) on new directions for evaluation raised the question of a convergence of the two practices. Eleanor Chelimsky concluded, having examined the evidence at the time, that there was a continuing move to closeness “with regard to understanding and to methodological approach.” She

6 M. Barrados and J. Lonsdale also acknowledged that there was disagreement on the degrees of closeness (Chelimsky, 1996, p. 61). Practitioners on either side of the debate offered varying analyses of how the practices were or were not heading toward common positions (Leeuw, 1996). John Mayne (2006) in his analysis concluded that there were areas of practice that were unique and that should not be addressed by the other. He argued that impact/effectiveness issues should be the preferred area of practice of evaluation and not internal or performance audit. Organizational systems and procedures should be the preferred area of practice of internal audit and not performance audit or evaluation. Now, over 22 years after the Wisler (1996) publication, the two practices remain distinct but the discussion has shifted from the issue of convergence to crossover, co-operation, and reliance. As the chapters in the book illustrate, practice lines are much more porous than Mayne advocated in 2006. In a survey of member organization the International Organization of Supreme Audit Institutions (INTOSAI) (which includes most national audit offices across the world), a Working Group on Program Evaluation found a portion of their members were undertaking performance audits that focused on effectiveness as well as economy and efficiency. Some went beyond performance audits “towards assessing public policy and evaluating public programs (France and the United States)” (INTOSAI, 2010a, p. 2). The analysis further points out that all possible uses of program evaluation are provided for in Performance Auditing Standards. The INTOSAI Working Group concluded that no special mandates were required for performance audit to conduct “program evaluation”-like examinations since they could be embedded within the performance audit mandate. It also noted that many performance audits could only be achieved by evaluative means. Evaluators and evaluation continue to be a source for recruitment, inspiration, and development for performance audit. And there is evidence that performance auditors are: more familiar with methods employed by evaluators than in the past; more willing to draw on the work undertaken and published by evaluators in their own reports; and more open to outside influences (Lonsdale et al., 2011). Early practitioners such as Eleanor Chelimsky at the Government Accountability Office (GAO) and Joe Hudson at the Office of the Auditor General of Canada were not trained as accountants and came from program evaluation backgrounds to develop their offices’ performance audit practice. More recently, performance audit in national audit offices has been significantly taken forward by practitioners from a wide range of backgrounds, and from external specialists who have been brought in to assist in the design and execution of their work. The flexible nature of the legislation for performance audit allows practitioners from different disciplines to apply their methodologies to tackle and hopefully answer questions in varying ways. Consequently, both audit and evaluation are continuing to develop. Performance audit has grown in scale and intensity in national audit offices, in many countries over a period going back three or four decades. It has broadened its

Introduction 7 methods and approaches and has been required to deal with increasingly complex issues (Lonsdale et al., 2011). Performance audits have also come under greater scrutiny and challenge as in many cases they have tackled more contentious issues and sought to report at earlier stages in policy development and implementation, when they are more politically sensitive, as illustrated in Chapter 7. Performance auditors, like evaluators seek to determine whether desired results are achieved through examining inputs, outputs, and outcomes, not unlike the logic model or the management of change model (Morra Imas & Rist, 2009). A positivist approach, familiar to many evaluators, is taken to assessing program functioning. The implicit assumption is that there is a logical application of process and leadership to achieve the desired results. Auditors will emphasize management processes and compliance with rules, procedures, and directives more than evaluators. There will also routinely be more of a resource focus on the subject matter studied.

National Audit Offices as Leaders in Practice A defining characteristic of performance audit is that it is primarily conducted by independent offices that are not part of an organization responsible for the delivery of a policy or program. Leading performance audit practice is found in many national audit offices as part of their oversight of government. National audit offices tend to have strong statutory authority. This usually provides for independence from the direction of government ministers and direct accountability to the legislatures/parliaments that receive their reports. These powers and their implications for public reporting have often been at the heart of the contested nature of performance audit. Specific mandates of national audit offices vary but all have a common interest in the performance of governments and their handling of public expenditures and receipts. Statutory authorities can include a provision to compel access to premises and documents. Considerable efforts have been made as the nature of public service delivery mechanisms have changed (for example, the introduction of greater private sector delivery and contracting out of services) to ensure that these rights of access have been protected or enhanced, not without some degree of dispute in some cases between auditors, their parliamentary allies and the executive function (Sharman, 2001). The “clients” for the national audit offices are bodies of democratically elected representatives, often a specific committee such as the Public Accounts Committee in the Westminster systems, and indirectly the media, interest groups and the citizenry. Such bodies usually cover much of the universe of government expenditures. This allows them to often take a cross-government or thematic perspective in their work, a significant aspect of the program of the UK NAO, for example. In other systems, the recipients may be subject-specific committees, which receive reports on particular fields of government. The reports from the National Audit Office and subsequent parliamentary inquiry making use of the evidence in these reports, provide a unique public profile for

8 M. Barrados and J. Lonsdale the offices and their heads. They are looked at as an important source of information to legislatures for independent oversight and for citizens on the workings of their governments, and on what has been achieved with the expenditure of their tax money. Because of the public nature of their work and the fact that it is used by politicians – even in a context which is often designed to be non-partisan – performance auditors, in particular, take care that they are not subject to political influence in the selection of topics or when they are reported. Audit plans can be tabled with the legislature and be part of the public record. In some cases, the time and method of delivery of reports are set by statute, providing a fixed framework, but in many others the timing of publications is deliberately at the discretion of the National Audit Office to avoid any suggestion of manipulation. Often using a risk-based approach, national audit offices develop a program that can be delivered on time and on budget. They can only cover a limited number of topics a year and often have planning processes for prioritizing from among a wide range of possibilities. Formally, many audit offices discuss their audit plans with their legislatures. Some will respond to specific legislative requests, taking care to avoid being seen to be captured by external interests. Evidence from some jurisdictions, for example, in the United Kingdom, is that programs have increasingly become more fluid as parliamentary expectations of topicality demand fast turnaround reports and briefings. The Government Accountability Office in the United States (GAO) does much of its performance work in response to congressional questions, while applying protocols to ensure that its independence is protected. The UK and Canadian national audit offices have an approach of integrating specific ideas raised by its parliamentary Public Accounts Committee and members of parliament into its work program. The Canadian office will respond to specific parliamentary requests to carry out an audit if the request comes from a motion of a committee, rather than requests from individual members or political parties. Despite this desire to be responsive, the ultimate decision on the programs rests with the Comptroller and Auditor General and Auditor General. Performance auditors will gather evidence to assess what has been achieved. They will first check to obtain performance information that is already available and check on its reliability. They may gather the information themselves or request that the government do the analysis. They also often review program evaluations, either the function or individual evaluations, although more so and with a longer history in some countries (e.g., Netherlands and Canada) than others (e.g., UK) (INTOSAI, 2010). Whatever the purpose, strong methodological and evaluation skills are required. National audit offices are not the only bodies that carry out performance audits. Audit offices at state or provincial levels, some municipal and internal auditors also do performance audits. There are also special offices that have been created to do performance audits such as The Office of the Auditor General for Local Government in British Columbia, Canada. Other bodies do work which has many characteristics of performance audits. In the UK, for example, the

Introduction 9 Independent Commission for Aid Impact scrutinizes UK aid spending, looking to see that “it is spent effectively, for those who need it most, and delivers value for UK taxpayers” (ICAI, 2018).

Internal Audit and Program Evaluation The legal basis for internal audit and program evaluation in government can be statutory or through choices that are often part of management policies. Both statutes and policies tend to be further elaborated in guidelines and training material. Internal audit and performance audit share the same accounting professional roots and share the same general approach to their work as set out in their standards. Internal audit is widespread in OECD countries (OECD, 2004). It works within organizations, in the government, and in the private and not-for-profit sectors. It is independent of management responsibilities but located within organizations to support risk management, control, and governance and provides independent, objective assurance of their effectiveness. Internal audit aims to improve an organization’s operation through its work. Internationally accepted standards are set by the Institute of Internal Auditors (IIA), which also certifies internal auditors through written examinations. In the public sector, internal audit and evaluation are both located within organizations to service management and boards. In some organizations they are co-located. They can examine the same program or work area, sometimes simultaneously. While they can both examine aspects of operations, internal audit tends to focus more on risk management, management control and governance processes and control, while the evaluator is more focused on results. Internal audit reports, like evaluation reports, are internal to an organization. They do not have the same public profile as performance audits and hence tend to be audits that are less resource-intensive (because they do not need to be subject to such extensive evidential standards), more narrowly focused and tend not to have the same level of review and scrutiny of performance audit. The internal audit policies and standards are sufficiently broad that internal audit can deal with operational effectiveness issues and report them in a similar way as national audit offices. For example, the Public Service Commission of Canada has a statutory obligation under the Public Service Employment Act, to report to parliament on the discharge of its own authorities to staff the Canadian public service. These authorities are delegated but remain the responsibility of the independent Public Service Commission. It conducts audits of its delegated responsibilities, which are performance audits of its own internal responsibilities, hence internal audits that are reported to parliament. These audits are separately managed and reported from other internal audits of the management control systems of the organization.

Performance Information – A Common Preoccupation Performance information, and the assessment of aspects of program performance, is a preoccupation common to evaluation, performance audit and internal audit.

10 M. Barrados and J. Lonsdale All three address issues related to aspects of empirically assessing performance and the reliance on or use of appropriate qualitative and quantitative data collection and techniques. Audits have traditionally focused on operational issues, whereas evaluators have become more interested in management issues (Leeuw in Wisler, 1996), strengthening the common interest in performance information, and measurement. Performance information can cover a range of sources from that which tracks expenditure and revenue, and traditional monitoring and evaluation systems that focus on inputs, activities, and outputs, as described by Morra Imas and Rist (2009, p. 108). More recently, evaluators have placed greater emphasis on results information and results-based measurement and evaluation. The development of “big data” and the increased opportunities to analyze data on a scale that has hitherto been impossible have also expanded the focus of the work of performance auditors and evaluators. The systems that collect and monitor performance information are essential to the work of both auditors and evaluators. But what sets the practices apart is the starting question they are asked to answer – what are the results or outcomes of a program or initiative? Or were the results achieved by a program or initiative worth the cost? Evaluation tends to emphasize the former, and certainly more so than performance audit, which has a particular interest in the latter. The setting of the question or the location of who asks the question that defines the demand for performance information can also take a number of forms. The practice for audit and evaluation is defined by how the demands are met. It is these practices that become professionalized and this is one of the factors that set the audit and evaluation apart.

Themes and Structure of the Book The relationship between theory and practice is a challenge. To add insight to these challenges the book is structured in three parts to deal with theory, with practice issues and with how the practices have worked together. The first part, Chapters 2 to 5, provides definitions of performance audit, internal audit, and program evaluation. Each chapter is written by specialists/practitioners in their areas. They provide background on the development of the practices, the standards, and principles that are used as well as defining features of the practices. The last chapter in the section compares and contrasts the three practices. The second part of the book addresses a number of the challenges that professionals face in applying these standards and principles. Chapter 6 provides an overview of ethical concerns applicable to all three practices. Chapter 7 compares and provides an analysis of cases that have threatened reputational risk. Chapter 8 examines the issues of framing conclusions and recommendations. Chapter 9 examines how a rapidly changing external environment can affect the approach to planning and conducting performance audit, using the case of the UK National Audit Office. Chapter 10 describes the innovation of embedded evaluation and how to deal with issues of maintaining an independent perspective.

Introduction 11 Performance audit, internal audit, and evaluation continue to be distinct practices. Part III contains examples of organizational crossover between the practices, how they have worked together and the lessons that were learned from that experience. Chapter 11 provides the case example of the GAO in the US where evaluation specialists are an integral part of the organization working to standards that are consistent with international performance audit standards. Chapters 13 and 14 provide an analysis of different approaches to having internal audit and evaluation work more closely together in UNESCO and UNDP. Chapter 14 describes the experiences in a development bank where internal audit and evaluation have remained separate throughout the reforms of the internal oversight and review functions. The book has drawn together experienced international chapter authors whose experience and insight have provided a robust analysis of the achievements and challenges of the three practices and how they can complement and support one another.

References Bemelmans-Videc, M.L., Lonsdale, J., & Perrin, B. (2007). Making Accountability Work: Dilemmas for Evaluation and Audit. New Brunswick, NJ: Transaction Publishers. Chelimsky, E. (1996). Auditing and Evaluation: whither the Relationship? In Carl Wisler (Ed.), Evaluation and Auditing: Prospects for Convergence (pp. 61–67). San Francisco, CA: Jossey-Bass Publishers. Independent Commission for Aid Impacts (ICAI) (2018). ICAI’s Remit. Retrieved from icai.independent.gov.uk. Institute of Internal Auditors Research Foundation (2017). International Professional Practices Framework (IPPF), FL, US. Retrieved from na.theiia.org. International Organization of Supreme Audit Institutions (INTOSAI) (2010a). Working Group on Program Evaluation. Program Evaluation for SAIs A Primer. Retrieved from Eurosai.org. International Organization of Supreme Audit Institutions (INTOSAI) (2010b). International Standards of Supreme Audit Institutions (ISSAI 40) – Quality Control for SAIs. International Organization of Supreme Audit Institutions (INTOSAI) (2013). International Standards of Supreme Audit Institutions 300 (ISSAI 300). Fundamental Principles of Performance Auditing. Retrieved from ISAII-300-engligh.pdf. Leeuw, F.L. (1996). Auditing and Evaluation: whither the Relationship? In Carl Wisler (Ed.), Evaluation and Auditing: Bridging a Gap, Worlds to Meet? (pp. 51–60). San Francisco, CA: Jossey-Bass Publishers. LeClerc, G., Moynagh, W. D., Boisclair, J.-P., & Hanson, H. R. (1996). Accountability, Performance Reporting, Comprehensive Audit – an Integrated Perspective. Ottawa, Canada: CCAF-FCVI Inc. Lonsdale, J., Wilkins, P., & Ling, T. (Eds.) (2011). Performance Auditing: Contributing to Accountability in Democratic Government. Cheltenham, UK: Edward Elgar. Mayne, J. (2006). Audit and Evaluation in Public Management: Challenges, Reforms and Different Roles. The Canadian Journal of Program Evaluation, 21: 11–45. Morra Imas, L.G., & Rist, R.C. (2009). The Road to Results Designing and Conducting Effective Development Evaluations. Washington, DC: The World Bank.

12 M. Barrados and J. Lonsdale OECD (2004). Public Sector Modernisation: Modernising Accountability and Control, Public Governance and Territorial Development Directorate Public Governance Committee, Paris. Retrieved from oecd.org. OECD (2010). Glossary of Key Terms in Evaluation and Results-Based Management, Development Assistance Committee, Paris. Retrieved from oecd.org. Power, M. (1997). The Audit Society: Rituals of Verification. New York, NY: Oxford University Press. Sharman, Lord. (2001). Holding to Account: The Review of Audit and Accountability for Central Government, Report by Lord Sharman of Redlynch, HM. Treasury. Retrieved from http://webarchive.nationalarchives.gov.uk/20081023195427/www.hm-treasury. gov.uk/d/38.pdf. Wisler C. (Ed.)(1996) Evaluation and Auditing: Prospects for Convergence (pp. 61–67), San Francisco, CA: Jossey-Bass Publishers.

Part I

The Basics

2 Performance Audit Defined Basia G. Ruta

Introduction Defining what an audit or, more specifically, a performance audit consists of is reasonably straightforward but affords little understanding of its practices and how it is professionalized. For both users and managers an understanding of the practice provides insights into how to best use the products that arise from this type of work, as well as their advantages and limitations. This chapter describes how the practice of performance audit is defined and provides an overview of performance audit standards and process. The analysis is drawn primarily from international standards and guides that are framed in ways which provide for national variation. International standards are used extensively in summary throughout the chapter, with further detail available from the references. The chapter concludes with some thoughts on the challenges facing practitioners and new professionals entering into the practice.

The practice of performance audit developed in the 1970s, drawing its inspiration from the financial audit and accounting professions. It has been professionalized through the development and application of national and international standards. The development of the practice, with accompanying standards and guides, has been led by national audit offices within their own institutional settings, and through national and international organizations. National audit offices are not part of the organizational structure of the entities to be audited. Their audit practices are external to government departments and agencies, and hence, as we saw in Chapter 1, are differentiated from internal audit (described in Chapter 3), which is situated within government bodies. Since most – but not all – national audit offices also conduct financial audits their methodological work on performance audit has maintained linkages with financial audit. Many have also developed close connections with the evaluation profession – described in Chapter 4 – and have been influenced by some of the developments in that sphere.

16 B. G. Ruta

How the Practice Is Defined Performance Audit Definitions The dictionary describes “audit” as a formal methodical examination and review (of something), or official inspection (of something), typically carried out by an independent body (Merriam-Webster Dictionary, 2018; Oxford English Dictionary, 2018). On its own, the term is often associated with financial statements, financial accounts or assertions. An audit is thus carried out to obtain reasonable assurance for the purpose of enhancing the confidence that a stakeholder can attach to an organization’s financial statements or other (financial) accounts. The work involves an auditor formally verifying or inspecting accounts (typically on a test basis). The auditor “performs procedures to reduce or manage risk (to an acceptably low level) of reaching inappropriate conclusions, recognizing that the limitations inherent to all audits mean that an audit can never provide absolute certainty …” (INTOSAI, 2013b, para. 40). A performance audit is also a formal examination, and involves the examination of the performance of a public organization or program by an independent auditor on behalf of a client. The client is usually a parliament, legislature or congress, and ultimately citizens. A performance audit builds on traditional financial statement audit concepts by expanding the focus beyond financials to programs and processes. It also emphasizes accountability for outputs and outcomes with due regard to economy, efficiency, and effectiveness. Performance audit has evolved over the last 30 years in many jurisdictions beyond examinations of the control of resources or inputs by financial audit to accountability for results achieved (Daujotaite & Macerinskiene, 2008, pp. 178, 184). This can be seen in the more comprehensive definition of performance audits provided by the international standard-setting body for public-sector auditing which states that: [Performance auditing] is an independent, objective and reliable examination of whether government undertakings, systems, operations, programmes, activities or organizations are operating in accordance with the principles of economy, efficiency and/or effectiveness, and whether there is room for improvement. Performance auditing seeks to provide new information, analysis or insights and, where appropriate, recommendations for improvement. The overarching objective is to be constructive in promoting economical, efficient and effective governance (INTOSAI, 2013c, para. 9–10, 12). Thus, in simple terms a performance audit involves assessing whether government policies, programs and institutions are well managed and being run economically, efficiently and effectively. It is a task of potentially great significance, at a practical level for citizens, and at a more abstract level for the health and vitality of democratic governance (Lonsdale, Wilkins, & Ling, 2011, p. 1)

Performance Audit Defined 17 It supports accountability, transparency and informed decision-making. National Audit Offices or Supreme Audit Institutions (SAIs), as they refer to themselves, are typically the most visible entities that carry out external performance audits. Their work attracts much attention due to the size of their budgets and the breadth of the topics addressed such as national security, taxation, education and health, and the interest of outside stakeholders. Provincial, state, and local government (external) audit offices are also important players. They similarly address significant areas of responsibility for their jurisdiction. As with SAIs, their reports are broadly disseminated, often get attention from media, and spark discussions in legislatures and local councils. It is through these proceedings government officials are held to account. The heads of SAIs come from a variety of disciplines. In some countries, the law requires heads of SAIs to be accredited or licensed professional accountants. In others, professional and licensed accountants perform financial statement audits (or public accounts audits) and express opinions on the fairness and reliability of those statements. Licensed or accredited professional accountants must comply with the standards of the profession required in their jurisdiction to perform audit-level assurance work. Performance Audit Characteristics Some people use the terms “value for money audits” or “operational audits” to mean much the same kind of activity, but performance audit is the term most often used since it places the emphasis on the 3E’s – Economy, Efficiency, and Effectiveness. “Value for money” can be taken (mistakenly as it turns out) to mean a greater focus on price (economy considerations) or as leading to absolute judgments on the utility of a policy or the merits of a program. Performance audits are not typically repeated on an annual basis. In this way they differ from financial statement audits, which are regular annual year-on-year engagements focused on whether financial statements are prepared, in all material respects, in accordance with an applicable financial reporting framework. Performance audits can also address a wide variety of topics, although usually with a resource focus or from a perspective which concentrates on the risks to effective use of resources. For example, they may target expenditure, revenue streams, programs or activities, and physical assets. One or several operations can be examined in part or in full, either within one government department or across any number of government operations or entities. The possibilities are numerous and have the potential to attract interest from a broad range of stakeholders within government, legislatures, the media, sector specialists, and the public. There are parallels with other forms of audit. Many organizations use internal audit as a management tool to improve programs and operations, inform decisionmaking internally and enhance accountability internally. Such audits may also address economy, efficiency, and effectiveness considerations. However, the professional and evidential requirements for internal audit practices are different

18 B. G. Ruta from those for external audit work since the work is meant for internal purposes (rather than for external consumption), as a management tool (rather than for the purposes of external public accountability), and is not specifically designed for reliance by external stakeholders. Evaluations also provide information on performance, particularly on program impacts and outcomes. In addition, consultants can play a role in assessing performance attributes and results. The work of such outside evaluators, advisors, and sector specialists (as well as internal auditors) can potentially provide sources of evidence for an external performance audit if the performance audit standards are met. These include meeting specified conditions of objectivity, competence, capabilities as well as adequacy and appropriateness of the work as evidence. For internal audit, there are added requirements for a systematic and disciplined approach including quality control in the work undertaken backed by organizational status and policies that support the objectivity of the unit (International Auditing and Assurance Standards Board (IAASB), 2013, para. 52–55, A120– A135). Such assessments are required since the work of these outside players is typically not produced in the same systematic way or to the same professional standards as by national audit offices whose performance audit work is generally considered within routine statutory processes and whose conclusions are often addressed in formal public hearings where responsible parties are held to account. Source of Performance Audit Standards Performance audit work is shaped and influenced by performance audit standards. A common source for the international standards and guidance for performance audit in the public sector is INTOSAI – the International Organisation of Supreme Audit Institutions. Founded in 1953, it is an umbrella organization for SAIs choosing to participate. As part of its mission, INTOSAI develops professional standards and practice guidance to help ensure “credibility, quality and professionalism” (INTOSAI, 2013a, para. 1) in carrying out audits in the public sector. These standards and guides are advisory, “do not override national laws, regulations or mandates,” and are used by many SAIs to develop their own approaches (INTOSAI, 2013a, para. 7). The international standards for professional accountants issued by the International Auditing and Assurance Standards Board (IAASB) of the International Federation of Accountants (IFAC) underpin the principles and standards promoted by the International Organization of Supreme Audit Institutions for public-sector auditing. There are many parallels in the standards, as can be seen by the approaches to assurance, ethics, and quality control. The international standards for professional accountants for assurance engagements fall into three categories: 1

Standards on quality control (in conjunction with relevant ethical requirements), which are applicable to all assurance engagements including performance audits;

Performance Audit Defined 19 2

Standards on auditing (and standards on review engagements) that apply to the audits (reviews) of financial statements and other historical financial information. They do not apply to performance audits; and Standards on assurance engagements (also referred to as other assurance engagements). This category applies to performance audits. The relevant standards are considered as the engagement standards for this line of work. (IAASB, 2016–2017a, pp. 6–11, para. 8; IAASB, 2004, p. 1; IAASB, 2013)

3

INTOSAI’s Framework of Professional Standards covers similar broad areas but are organized differently. SAIs can be subject to country-specific national standards and often set their own office standards and procedures building from the standards developed for financial audits and international guidance. This approach in turn has influenced the evolution of the practice of performance auditing in many jurisdictions, informing country-specific national as well as international standards, building from the conventions, norms, and practices entrenched in the public accountancy profession. Within the context of these standards, many SAIs have made performance audit a key line of business. They deliver unique, identifiable performance audit reports each year in addition to financial statement (public accounts) and compliance audits (INTOSAI, 2013b, p. 5). On a global scale, the performance audit regimes of SAIs stand at various levels of maturity. The span is vast; ranging from SAIs just starting to add the practice to their work to those in which the practice operates at seasoned levels, meets all recommended practice, delivers value-added reports, and adheres to the principles of continuous improvement (INTOSAI, 2016e, p. 34).

Box 2.1 International Standard Setting Organizations For Professional Accountants IAASB – International Auditing and Assurance Standards Board (of the International Federation of Accountants), e.g. •

ISAE International Standard On Assurance Engagements 3000 (Revised) – Assurance Engagements Other than Audits or Reviews of Historical Financial Information

For Supreme Audit Institutions ISSAI – International Standards of Supreme Audit Institutions (developed by the International Organization of Supreme Audit Institutions), e.g. •

ISSAI 3000 – Standard for Performance Auditing

20 B. G. Ruta Canada and Australia serve as examples of jurisdictions where national and provincial audit offices very explicitly comply with country-specific standards aligned to ISAE 3000 (Revised), the recently revised international standard for professional accountants on assurance engagements applicable to performance audit work. Canada is also among the first nations to have issued an additional standard, CSAE 3001 (AASB, 2015), also aligned to the international standard to address in more depth a specific type of assurance engagement – a direct engagement – referenced in the international standard (IAASB, 2013, pp. 5, 8) and highly prevalent in Canada for performance audits carried out by national and provincial audit offices. In Canada, national and provincial audit offices very explicitly comply with CSAE 3001. Similarly, Australia also has a country- specific standard addressing direct engagements – ASAE 3500, Performance Engagements (AUASB, 2017b). Revised in 2017, the standard applies to assurance work on performance mandates carried out as direct engagements. More broadly for SAIs generally, INTOSAI notes that “performance audits are normally direct reporting engagements” (INTOSAI, 2013b, para. 30). Direct engagements are also reflected in INTOSAI’s standards and guidance for performance audit and cover all aspects of the operations of the SAI from the position and functioning of the organization, the management of human resources, standards of conduct, and carrying out an audit.

INTOSAI Standards Framework for Performance Audit As can be seen in Box 2.2 INTOSAI’s Framework of Professional Standards is set out in four levels. Organizational Prerequisite: Quality Control and Ethical Requirements INTOSAI standards require that an appropriate system of quality control over audit activities and reporting are in place for all audits. A broad-based organizational system is set out in the requirements for a quality control framework consistent with the International Standard on Quality Control (ISQC-1) for professional Box 2.2 INTOSAI’s Framework of Professional Standards Level 1 Founding Principles Level 2 Organisational Prerequisites: – independence, transparency and accountability – ethics and quality control Levels 3 and 4 Principles, Standards and Guidelines for Individual Audits (Source: INTOSAI, 2013b, paras. 3, 4, 5)

Performance Audit Defined 21 accountants and adapted to a SAI environment (INTOSAI, 2010b, p. 2). The elements addressed include leadership responsibility for quality, relevant ethical requirements, considerations of the acceptance and continuance of client relationships and specific engagements, human resources, engagement performance, and monitoring (INTOSAI, 2010b, pp. 2–14). (SSAI 40). This system of quality control is meant to fulfill the same purpose as ISQC1, that is “to provide reasonable assurance that the firm and its personnel comply with professional standards and applicable legal and regulatory requirements; and that reports issued by the firm are appropriate in the circumstances” (IFAC, 2009, para. 11; INTOSAI, 2010b, pp. 3, 4; ISSAI 40). Those obligations extend to everyone on the team or within the office on performance audit engagements. The levels of review required during an external audit are significant, necessitating, for example, at least one level of review of all audit documentation, with higher risk areas likely requiring at least two levels of review (OAG, 2018b). The institutional and reputational risks to manage can be substantial for a SAI since the audits are specifically designed for external consumption. The audit reports are justified on the grounds that the results of the audit will generally be made public, will be used within accountability settings, and may result in criticism of individual organizations and their senior management, or be the basis for decisions to be made about resource allocation in government. Further, performance audits undertaken by SAIs can also have an impact on private sector entities providing services to government, and thus may, for example, affect their reputations or share prices. There are thus many reasons why reviews of quality are important. Reviews are also a key means for a SAI to demonstrate compliance with professional standards including exercising professional judgment with due care and an objective state of mind, and ensuring collective professional competence and proper supervision over the audit. Substantial and systematic reviews take place at each phase of an audit regardless of team members’ credentials, expertise, and specialization. A quality review partner may be appointed independently of the team performing the work where the nature of the engagement, including the extent to which it involves a matter of public interest, is assessed as significant, or where there may be unusual risk. The reviewer is required to have the experience and authority to objectively evaluate the significant judgments made and the conclusions reached in the performance audit report (IFAC, 2009, para. 36b; Chartered Professional Accounts Canada (CPA Canada), 2009, para. 35b, A41; INTOSAI, 2010b, p. 12). See, for example, the use of an independent quality review in Chapter 7. The overall quality control system is subject to periodic independent assessment (INTOSAI, 2010b, pp. 12–14). INTOSAI’s Code of Ethics elaborates in more detail on the ethical requirements (INTOSAI, 2016a). Engagement team members are required to adhere to the following five ethical principles in Box 2.3.

22 B. G. Ruta Box 2.3 INTOSAI’S Code of Ethics 1 2 3 4 5

Demonstrate integrity Respect independence and objectivity; Maintain Professional Competence (and due care); Sustain a reputation for excellence in professional behavior; and Ensure Confidentiality and transparency. (Source: INTOSAI, 2016a, para. 9)

Organizational Prerequisite: Independence of SAIs A pre-condition for ensuring high-quality audits, is the requirement for an SAI to be independent organizationally in both fact and perception. Securing the requirement for independence in legislation guards against potential interference and allows full autonomy to carry out the work. Independence also requires an SAI to have sufficient resources and freedom in using those resources, financial, and personnel, to carry out work based on its best judgment in accordance with professional standards. Where funding is at issue, SAIs are expected to make public their needs and seek directly to increase their budgets to funding authorities (INTOSAI, 1977, Sections 5–8). Organizational Prerequisite: Transparency and Accountability for SAIs Another organizational prerequisite is for SAIs to demonstrate transparency in and accountability for their operations as a measure of good governance. SAIs are expected to lead by example in managing their own operations; serve as model public-sector organizations by demonstrating transparency in meeting obligations for audits and reports under their mandates; periodically evaluate their performance; follow up on audits; assess the impact of their work; and report publicly on their use of funds, activities, and cost of audits. For an SAI, transparency includes the added dimension of being open with outside stakeholders through effective and timely reporting on the status of activities under its mandate, strategies, financial management, and overall performance. It makes public audit findings and conclusions and is answerable to questions on its operations (INTOSAI, 2013a, pp. 4–11). Performance Audit Assurance Standards for the Conduct of Individual Audits There are two types of performance audits: ones that are carried out as direct engagements and the others as attestation engagements. A direct engagement occurs when the auditor does the actual measurement or evaluation of what is being audited (the underlying subject matter) against the applicable criteria

Performance Audit Defined 23 (or benchmarks) and reports the findings and conclusion to outside users in a comprehensive report (INTOSAI, 2013b, para. 26–30, 50, 51, INTOSAI, 2013c, para. 39; IAASB, 2013, p. 8). In an attestation engagement, the responsible party (entity), not the auditor, measures or evaluates the underlying subject matter (what is being audited) against the applicable criteria and presents findings and conclusions in a report or statement. The auditor then gathers sufficient and appropriate evidence to provide a reasonable basis to render a conclusion on whether the responsible party’s findings, conclusions are, in all material respects, free from misstatement (IAASB, 2013, pp. 7–8, 62, 63; INTOSAI, 2013b, para. 29). As mentioned before, government performance audits are normally direct reporting engagement and are reflected in INTOSAI’s standards for performance audits (INTOSAI, 2016b, para. 30). Assurance engagements can also vary by the level of assurance they provide and are referred to as either audit or review. Audits must always be designed to provide a reasonable (or high) level of assurance on the conclusions reached in a report, whereas a review provides a limited (or moderate) level of assurance. Choosing one over the other will have implications on the sufficiency and appropriateness of evidence that needs to be collected (INTOSAI, 2013b, para. 31–33), and on the form of the conclusion that can be presented. Performance work undertaken by SAIs is usually done to an audit level requiring a reasonable (or high) level of assurance. This calls for substantial evidence and corroboration throughout each stage of the audit process from planning through reporting, in order to reduce the risk of an inappropriate conclusion to an acceptable low level. INTOSAI standards and related guidance material cover the requirements for the performance audit process from organizational strategic planning, specific audit planning through examination to reporting. An SAI will typically institute internal policies and practices to support the approach and requirements for an audit engagement, including, for example, pre-audit launch work the conduct of audit surveys and the preparation of audit plans and selection of criteria. The office will use these tools to clarify expectations, meet required technical professional standards, ensure consistent and systematic application by staff, and help with the quality control review requirements for assurance engagements. Such policies and practices would be in place for all phases of the performance audit process and in keeping with documentation standards (INTOSAI, 2010b, pp. 11–12, INTOSAI, 2013c, pp. 11–14).

The Performance Audit Process Planning typically begins with a strategic plan for the SAI to act as the framework for identifying priorities and timing for upcoming individual performance audits. The plan involves identifying and deciding on audit topics related to the principles of economy, efficiency, and effectiveness (INTOSAI, 2016d, pp. 3–5).

24 B. G. Ruta A SAI will generally undertake risk-based planning on a cyclical basis to serve as a strategic plan. Implicit in the exercise is the requirement for SAIs to be independent in determining the topics they will pursue (INTOSAI, 2016c, para. 10–12). This requirement does not preclude SAIs from responding to parliamentary, legislative or congressional requests. In some jurisdictions, for example the Government Accountability Office in the US, as described in Chapter 11, responding to congressional or subcommittee requests is part of the mandate of that office. SAIs such as the National Audit Office in the United Kingdom and Canada are also expected to take account of ideas proposed by the parliamentary committees with which they are closely associated. The priorities flowing from a risk-based audit plan form the basis upon which individual audit topics are selected. These priorities are typically made public covering a few years at a time, on a rolling basis. Audit topics may be strategic or operational in nature. They may also focus on what works well and what does not work as well. In preparing the risk-based audit plan, practitioners gather information through consultation, surveys, research, and comparison to work done in other jurisdictions or by other types of review organizations such as think tanks or inspectorates. They also rely on their knowledge of the business of the organizations subject to audit (INTOSAI, 2016d, pp. 3, 11–14). The strategic plan is also used to identify early on the potential scope and lines of inquiry for future audits. These elements in turn are screened for audit worthiness and auditability. Audit worthiness means assessing the audit topics on their significance, either based on scale of spend (materiality) and/or qualitative considerations such as sensitivity, associated risks, causes, and consequences or impacts on stakeholders. Auditability means assessing whether an audit topic can be examined. It considers the availability of information to gather sufficient and appropriate audit evidence, the complexity of the subject, the audit methodology and skills needed, and the availability of resources. Ultimately, the final selection and timing of individual audits represented in the risk-based audit plan are matters of professional judgment, which includes consultation with legislative authority (INTOSAI, 2016b, pp. 13–15; 2016d, pp. 3–5). Individual Performance Audit Planning Planning at the individual audit level consists of building on the work done in the strategic risk-based audit plan and determining whether it is reasonable to proceed to a full audit. Decisions on choices are important because audits use up scarce public resources and also impose a burden on those organizations subject to the scrutiny work. Planning includes ensuring the availability of suitable criteria and audit evidence and determining the actual audit objectives, scope, criteria, and approach to the audit. At this point (or at the latest when conducting the audit), the auditor is required to discuss the criteria with the audit entity to ensure a common understanding of the benchmarks against which the audit entity’s performance will be assessed (INTOSAI, 2016c, pp. 15–16).

Performance Audit Defined 25 From a practice perspective, the scope, and methodological approach to the audit, and some form of acknowledgment that work will take place is obtained from the audit entity as well as on the objectives, the scope, and criteria for the audit (INTOSAI, 2016d, pp. 10, 15, 16). As a key part of its independence, an SAI has the right to undertake its chosen audit work, but there may nevertheless be some form of discussion with the audited entity as to the timing and approach to be taken, if only to allow for practicalities such as the availability of staff to be agreed. Conducting the Examination The examination phase involves taking steps to gather sufficient and appropriate audit evidence. Evidence is normally obtained by such means as inspection, observation, inquiry, confirmation, recalculation, re-performance, and analytical procedures. Auditors will usually seek corroborating evidence of a different nature and gather it from various sources to reach conclusions concerning a subject matter (INTOSAI, 2016d, para. 44–55). To determine if sufficient, appropriate audit evidence has been gathered involves answering whether the collective weight of the evidence gathered in the audit is enough to persuade a reasonable person that the observations and conclusions are valid. And whether the recommendations that flow from the conclusions about perceived gaps in performance are appropriate. Auditors use their professional judgment and exercise professional skepticism to determine the quantity and quality of audit evidence required to finalize their conclusions. Assessing audit risk and the potential for inaccurate conclusions are also factors considered to determine the quantity and quality of audit evidence required (INTOSAI, 2016b, para. 68–72, 2016d, pp. 16–18). At the end of the examination phase, auditors often confirm all key facts and findings with the audit entity. They also review additional sources of evidence presented and fully consider all perspectives relative to audit findings prior to reaching final conclusions and preparing the final report (INTOSAI, 2016b, para. 55, pp. 129–132). Preparing the Assurance Performance Report INTOSAI notes that: “Reports should be comprehensive, convincing, timely, readerfriendly and balanced” (INTOSAI, 2016b, para. 116). To be meaningful, performance audits also need to be properly communicated (INTOSAI, 2013c, pp. 16). On a practical level, the reporting phase involves writing a fair and evidencebased report in a structured manner while following specific conventions. There are minimum content requirements for the report. INTOSAI standards prescribe the minimum content of the report. Among other requirements, the report includes: • •

A description of the objective(s) of the audit, the entity, the subject matter and the time period covered (INTOSAI, 2016b, para. 117); The applicable standards used (INTOSAI, 2013b, para. 8);

26 B. G. Ruta • • • •

The criteria used and their sources (INTOSAI, 2016b, para. 117, 122); The level of assurance provided (INTOSAI, 2013c, para. 22, 2016b, para. 32); The conclusion(s) against the audit objective(s) (INTOSAI, 2016b, para. 117) and; Recommendations as appropriate (INTOSAI, 2016b, para. 117, 124–125).

SAIs will typically have established protocols to guide the preparation and finalization of the audit report. These protocols also address required communications with the audit entity and relevant third parties. These will involve the sharing of draft reports with the entity, which is usually provided with the opportunity to respond on factual accuracy and tone and balance. Finalization activities are all subject to quality control procedures and sign-offs to ensure compliance with standards. Attention will also be given to the practicality of recommendations to ensure they are relevant, constructive and add value (INTOSAI, 2016d, para. 125–128). Responses by the audit entity to the recommendations may be included depending on protocols established. In some jurisdictions, disagreements with findings by the audit entity may be included in the final report itself, while in others, comments and additional evidence are integrated into the report. In all cases the aim is that responses are addressed in a fair and balanced way. SAIs have a role in monitoring action pursuant to periodic performance audits. Follow-up audits – where they take place – focus on whether the audited entity has adequately addressed the matters raised, which depending on level of action could result in a further report in due time (INTOSAI, 2016b, pp. 19–20, 2013c, para. 42).

The Management of Audit Risk Exercising professional judgment with due care and an objective state of mind is critical throughout the performance audit process and in the application of professional standards for the performance audit. Such judgment reflects robust training, knowledge and expertise in the circumstances of the assurance engagement. Auditors are also required to exercise professional skepticism throughout the audit process. It is considered “vital” in assessing the sufficiency and appropriateness of audit evidence throughout the audit (INTOSAI, 2016c, para. 88). Skepticism can be described as maintaining an attitude that reflects a questioning mind, being alert to conditions that may indicate possible deviations from audit criteria, and assessing critically audit evidence obtained. Great effort is taken to prevent audit failures. The standards over quality control and ethical requirements help to mitigate such risk substantially. In addition to the specific standards of quality for each performance audit engagement, the international standards call for SAIs audit offices to establish a formal monitoring process on a cyclical basis to provide reasonable assurance that their office’s overall system of quality control is relevant, adequate, and operating effectively

Performance Audit Defined 27 on a continuous basis. The monitoring also includes independent inspections of performance audit work against the required policies and practices on completed engagements, among others. SAIs are required to evaluate the effect of any deficiencies identified (INTOSAI, 2010a, pp. 6, 8, 2010b, pp. 12–15). An audit failure would occur when a performance audit report is issued with inappropriate conclusions based on the evidence collected or with insufficient appropriate audit evidence to support the audit conclusions made (Beasley, Carcello, & Hermanson, 2001). This can occur when the assurance standards, quality control, and ethical requirements and assurance standards for the performance audit are not properly addressed. In other words, another practitioner or audit office or SAI undertaking the same engagement but adhering fully to the professional standards and ethical and quality control requirements, would arrive at different conclusions or would require substantially more robust evidence to meet the level of assurance being provided. The standards are meant to promote the same conclusions and level of rigor that would be reached regardless of the Audit Office or qualified practitioner or SAI which performed the work. In a direct performance audit engagement, an audit failure would occur, for example, when incorrectly concluding that what is being audited (the underlying subject matter) complies, in all significant respects, with the applicable criteria – when in fact it does not. Alternatively, an audit failure would also happen when incorrectly concluding that what is being audited (the underlying subject matter) does not comply, in all significant respects, with the applicable criteria – when, in fact, it does. Typically, such instances would be further investigated to inform the quality control process within audit offices or SAIs including training and learning as well as disciplinary measures where appropriate. Failure to comply with applicable professional standards including the standards over quality control (and/or ISQC-1) (and Ethical Requirements) can be severe. Such situations could, for example, have a significant impact on the reputation of the SAI or might trigger legal actions or depending on statutory provisions, could lead to removal from office, and/or, disciplinary action or loss of license for heads of SAI who are professional accountants (INTOSAI, 2010b, pp. 12–14, 1977, Section 6; IFAC, 2009, para. 51).

Concluding Remarks This chapter provides information about the basics of performance audit in the public sector and its practice. A number of emerging challenges are likely to have an impact on how performance audits are conducted in the future. There is always pressure to do more at less cost in an environment of scarce resources, reduce time to reporting in an environment of instant messaging and 24-hour news services, and increase responsiveness to disruptive events. Technology is increasingly used to achieve continued efficiencies in evidence collection and greater coverage in the evidence base, and to innovate, where appropriate, through the use of artificial intelligence. The advances in the use of artificial

28 B. G. Ruta intelligence almost certainly will have an impact in the future on the practice expectations surrounding the careful exercise of professional judgment, due care, and objective state of mind. In addition, the increasing use of Big Data methods in audit offices and across audit entities promises more robust coverage by auditors in their work and potentially greater quality in audit execution. Big data also provides audit entities with more effective tools to challenge and review audit conclusions reached. Demand is also strong to reduce costs. Overall, the formality of the practice, the demands of the standards for assurance engagements, together with potential consequences of non-compliance are key elements that distinguish external audit performance audits from internal audit and evaluations. Over the years, it is the view of the author that some practitioners entering the field of performance audit from the internal audit or evaluation streams have faced challenges in understanding the similarities and differences among external audit, internal audit, and evaluation disciplines. The journey can be difficult for individual practitioners and team leaders and can take time to discern. It is particularly challenging if a practitioner is required to harness new ways of doing similar work in line with the expectations for the practice of the external audit performance audit. Indeed, it usually takes a practitioner at least a full cycle to gain a thorough understanding of the context of an external audit performance audit. At an organizational level, there can be over-reliance by SAIs on internal quality assurance practices to provide comfort that internal standards are being met consistently across the oversight organization. This situation can be viewed at times as an inefficient use of scarce resources to address any inadvertent noncompliance with standards. However, with a greater understanding of the similarities and differences, the potential exists for increased dialog and sharing about the respective practices and facilitating the transition to an external audit environment faster with other co-benefits. Enhanced understanding can help create strong synergy in multidisciplinary teams to build on the knowledge gained from previous work in other disciplines, add to the performance audit toolkit to make it more effective, and help to resolve complex issues as they arise.

References Abd Manaf, N.A. (2010). The Impact of Performance Audit: The New Zealand Experience. Retrieved from http://hdf.handle.net/10063/1376. Auditing and Assurance Standards Board (AASB) (2015). Canadian Standard on Assurance Engagements 3001 (CSAE 3001) – Direct Engagements. Retrieved from www.frascanada. ca/en/other/documents/csae-3001-ed. Auditing and Assurance Standards Board (AUASB) (2017b). Australian Standard on Assurance Engagement 3500 (ASAE 3500) – Performance Engagements. Beasley, M.S., Carcello, J.V., & Hermanson, D.R. (2001). Top 10 Audit Deficiencies. Journal of Accountancy. Retrieved from www.journalofaccountancy.com/issues/2001/ apr/top10auditdeficiencies.html [Accessed 14 August 2018].

Performance Audit Defined 29 Chartered Professional Accounts Canada (CPA Canada) (2009). Canadian Standard on Quality Control 1 (CSQC-1). Daujotaite, D., & Macerinskiene, I. (2008). Development of Performance Audit in Public Sector. 5th International Scientific Conference Business and Management [16–17 May 2008]. Vilnius, Lithuania. International Auditing and Assurance Standards Board (IAASB). (2004). International Framework for Assurance Engagements. International Federation of Accountants. International Auditing and Assurance Standards Board (IAASB) (2013). International Standard on Assurance Engagements 3000 (Revised): Assurance Engagements Other than Audits or Reviews of Historical Financial Information. International Auditing and Assurance Standards Board (IAASB) (2016–2017a). Handbook of International Quality Control, Auditing, Review, Other Assurance and Related Services Volume I 2016–2017. International Federation of Accountants. International Federation of Accountants (IFAC) (2009). International Standard on Quality Control 1 (ISQC1): Quality Control for Firms That Perform Audits and Reviews of Financial Statements, and Other Assurance and Related Services Engagements. International Organization of Supreme Audit Institutions (INTOSAI) (1977). International Standards of Supreme Audit Institutions 1 (ISSAI 1). The Lima Declaration. International Organization of Supreme Audit Institutions (INTOSAI) (2007). International Standards of Supreme Audit Institutions 10 (ISSAI 10). Mexico Declaration on SAI Independence. International Organization of Supreme Audit Institutions (INTOSAI) (2010a). International Standards of Supreme Audit Institutions 20 (ISSAI 20) – Principles of transparency and accountability. International Organization of Supreme Audit Institutions (INTOSAI) (2010b). International Standards of Supreme Audit Institutions (ISSAI 40) – Quality Control for SAIs. International Organization of Supreme Audit Institutions (INTOSAI) (2013a). International Standards of Supreme Audit Institutions 12 (ISSAI 12) – The Value and Benefits of Supreme Audit Institutions – Making a difference to the lives of citizens. International Organization of Supreme Audit Institutions (INTOSAI) (2013b). International Standards of Supreme Audit Institutions 100 (ISSAI 100) – Fundamental Principles of Public Sector Auditing. International Organization of Supreme Audit Institutions (INTOSAI) (2013c). International Standards of Supreme Audit Institutions 300 (ISSAI 300) – Fundamental Principles of Performance Auditing. International Organization of Supreme Audit Institutions (INTOSAI) (2016a). International Standards of Supreme Audit Institutions 30 (ISSAI30) – Code of Ethics. International Organization of Supreme Audit Institutions (INTOSAI) (2016b). International Standards of Supreme Audit Institutions 3000 (ISSAI 3000) – Standards and guidelines for performance auditing based on INTOSAI’s Auditing Standards and practical experience. International Organization of Supreme Audit Institutions (INTOSAI) (2016c). International Standards of Supreme Audit Institutions 3100 (ISSAI 3100) – Performance Audit Guidelines: Key Principles. International Organization of Supreme Audit Institutions (INTOSAI) (2016d). International Standards of Supreme Audit Institutions 3200 (ISSAI 3200) – Guidelines for the performance auditing process. International Organization of Supreme Audit Institutions (INTOSAI) (2016e). Supreme Audit Institutions Performance Measurement Framework (SAI PMF). Retrieved from www.idi.no/en/idi-cpd/sai-pmf [Accessed 12 November 2018].

30 B. G. Ruta Lonsdale, J., Wilkins, P., & Ling, T. (Eds.). (2011). Performance Auditing: Contributing to Accountability in Democratic Government. Cheltenham, UK: Edward Elgar. Merriam-Webster Dictionary (2018). Audit. In Merriam-Webster Online. Retrieved from www.merriam-webster.com/dictionary/audit [Accessed 2 May 2018]. Oxford English Dictionary (2018). Audit. In Oxford Dictionaries Online. Retrieved from https://en.oxforddictionaries.com/definition/audit [Accessed 2 May 2018].

3 Internal Audit Defined David Rattray

Introduction Internal audit is an established tool to assist managers and governors of organizations to assess and improve governance, risk management, and internal controls which are all essential elements of organizational success. It can address questions such as whether an organization is operating as management expects, how well its operations are doing and how things can be improved. The Institute for Internal Auditors (IIA) defines the fundamental, nature, and scope of internal auditing as: An independent, objective assurance and consulting activity designed to add value and improve an organization’s operations. It helps an organization accomplish its objectives by bringing a systematic, disciplined approach to evaluate and improve the effectiveness of risk management, control, and governance processes. (The 2017 Edition of the Institute of Internal Audit, “International Professional Practices Framework” (IPPF or “Red Book”)) The IIA has over 190,000 members worldwide (IIA Global Fact Sheet, 2016). The majority of internal auditors are employed in publicly traded and privately held corporations, and a significant number are employed by local, state, and national governments. In addition, there are many individuals who are certified as internal auditors who work for professional accounting and consulting firms who are contracted periodically to organizations and government bodies as external resources. A very large percentage of those who are part of contractual agreements also possess other professional accounting or business designations. Internal audit organizations are commonly located within national and international publicly traded companies, medium to large national government departments, agencies, boards, and government business enterprises, as well as some “not-for-profit” organizations. The focus of this chapter is on internal audit in government organizations. However, the IIA and its international practice standards apply to private, public and not-for-profit sectors. This chapter describes the internal audit profession, guidance, and standards for internal audit, and skills and experience requirements. It also compares internal audit with external audit, which includes performance audit and financial statement audits.

32 D. Rattray

The Internal Audit Profession The Institute of Internal Auditors (IIA) plays the lead role internationally for the practice of internal audit, with its headquarters based in the State of Florida and IIA affiliates worldwide. Where an internal audit organization has been formed to conduct work for management and a governing board, all internal auditors are expected to follow IIA Standards, irrespective of the size and nature of the organizations they are employed in. The IIA has promulgated a framework called the International Professional Practices Framework (IPPF) or “Red Book,” to capture and articulate the essence of what the IIA is responsible for. This framework drives the work of IIA executives and its membership body in order to advance the practice of internal audit worldwide. The International Professional Practices Framework (IPPF) is the conceptual framework that provides authoritative mandatory and recommended guidance promulgated by the IIA. This guidance is organized as follows: (2017 Edition of the IIA “International Professional Practices Framework”). Mandatory Guidance: • • • •

Core Principles for the Professional Practice of Internal Auditing. Definition of Internal Auditing. Code of Ethics. International Standards for the Professional Practice of Internal Auditing (Standards).

Recommended Guidance: • •

Implementation Guidance. Supplemental Guidance.

The Institute of Internal Auditors’ IPPF states that conformance with the IIA’s “International Standards for the Professional Practice of Internal Auditing (Standards)” is mandatory in meeting the responsibilities of internal auditors and the internal audit activity. This conformance with the principles set forth in mandatory guidance is essential for the professional practice of internal auditing unless there are over-riding circumstances such as jurisdictional legislation, in which case a departure from the guidance should be highlighted in the report to alert the reader there has been an exception made to the standards. Internal Audit Standard 1300 of the IPPF requires the chief audit executive of an organization to establish a Quality Assurance and Improvement Program (QAIP) to enable an evaluation to be carried out of the internal audit activity’s conformance with the Definition of Internal Auditing and the International Standards for the Professional Practice of Internal Auditing (Standards), as well as an evaluation of whether internal auditors apply the code of ethics. The program also assesses the efficiency and effectiveness of internal audit activity and identifies opportunities for improvement.

Internal Audit Defined 33 All internal audit activities, regardless of industry, sector, or size of audit staff – even those outsourced or co-sourced – must maintain a QAIP that contains both internal and external assessments. External assessments enhance value as they enable the internal audit activity to evaluate conformance with the standards; internal audit and audit committee charters; the organization’s risk and control assessment; the effective use of resources; and the use of successful practices. An internal audit activity must obtain an external assessment at least every five years by an independent reviewer or review team to maintain conformance with the standards (IIA Standard 1312). It is reasonable for a reader to assume that all of the mandatory standards have been followed in planning, conducting and reporting an audit assignment, unless otherwise stated. However, the IIA continues to have serious challenges with validating and enforcing mandatory conformance in all instances. This is despite mandatory guidance being developed following an established due diligence process, which includes a period of public exposure for stakeholder input. Details of each of these components can be found on the IIA website (www.IIA.org).

International Standards of Practice for Internal Audit Internal auditing is conducted in diverse legal and cultural environments; for organizations that vary in purpose, size, complexity, and structure; and by persons within or outside the organization. The IIA Standards apply to broad scopes of practice including topics such as an organization’s governance, risk management, and management controls over the economy, efficiency, and effectiveness of operations, including safeguarding of assets, the reliability of financial and management reporting, and compliance with laws and regulations. Internal auditing may also involve conducting proactive fraud audits to identify potentially fraudulent acts. The IIA IPPF or “Red Book” articulate standards for internal audit as a set of principles-based, mandatory requirements consisting of: • •

Statements of core requirements for the professional practice of internal auditing and for evaluating the effectiveness of performance that is internationally applicable at organizational and individual levels. Interpretations clarifying terms or concepts within the Standard.

The IIA Standards are divided into two categories: attribute and performance standards. Attribute standards address the prescribed attributes of internal audit organizations and individuals performing internal auditing (see Box 3.1 for the topics covered in the attribute standards). Performance Standards describe the nature of internal auditing and provide quality criteria against which the performance of these services can be measured (see Box 3.2 for the topics covered in the performance).

34 D. Rattray Box 3.1 Internal Audit Standards 2017 Attribute Standards 1000 – Purpose, Authority, and Responsibility The purpose, authority, and responsibility of the internal audit activity must be formally defined in an internal audit charter, 1100 – Independence and Objectivity The internal audit activity must be independent, and internal auditors must be objective in performing their work. 1200 – Proficiency and Due Professional Care Engagements must be performed with proficiency and due professional care. 1300 – Quality Assurance and Improvement Program The chief audit executive must develop and maintain a quality assurance and improvement program that covers all aspects of the internal audit activity. (Source: Institute for Internal Auditors, 2017)

Box 3.2 Performance Audit Standards 2017 Performance Standards 2000 – Managing the Internal Audit Activity The chief audit executive must effectively manage the internal audit activity to ensure it adds value to the organization. 2100 – Nature of Work The internal audit activity must evaluate and contribute to the improvement of the organization’s governance, risk management, and control processes using a systematic, disciplined, and risk-based approach. 2200 – Engagement Planning Internal auditors must develop and document a plan for each engagement, including the objectives, scope, timing and resource allocations. 2300 – Performing the Engagement Internal auditors must identify, analyze, evaluate, and document sufficient information to achieve the engagement’s objectives. 2400 – Communicating Results Internal auditors must communicate the results of engagements. 2500 – Monitoring Progress The chief audit executive must establish and maintain a system to monitor the disposition of results communicated to management.

Internal Audit Defined 35 2600 – Communicating the Acceptance of Risks When the chief audit executive concludes that management has accepted a level of risk that may be unacceptable to the organization, the chief audit executive must discuss the matter with senior management. (Source: Institute for Internal Auditors, 2017)

In addition to the IPPF list of standards, the IPPF also contains elaborations on these standards in what are called “Interpretations” for these standards, which apply to all internal audit services, audit and consulting, and serve to expand upon the understanding of the attribute and performance standards. The 2017 broad-based internal audit standards for assurance and consulting contained in the IPPF cover a wide range of internal audit assurance and consulting practices, which includes conducting performance audits. The IIA Standards are similar to those of INTOSAI (International Organization of Supreme Audit Institutions) that govern the practice of performance audits by Auditors General. The principal difference is in the audience of the intended reports. Internal audits are meant for management, while performance audit reports under INTOSAI standards are written for legislative bodies. Because of this, the scoping, nature, and level of reporting detail differ between the two. Assurance services involve the exercise of the internal auditor’s objective assessment of evidence to provide opinions or conclusions regarding an entity, operation, function, process, system, or other subject matters. The nature and scope of an assurance engagement are determined by the internal auditor. This is unlike the assurance services for financial statements provided by external auditors, which are generally determined by external legislation and regulation. Consulting services are advisory in nature and are generally performed at the specific request of an engagement client. When performing consulting services, the internal auditor must maintain objectivity and not assume management responsibility. As in the case of national audit standards (financial and performance audits covering all sectors) governing the activities of CPAs (Chartered Professional Accountants), the global IIA Standards apply to individual internal auditors and the internal audit activity. All internal auditors are accountable for conforming with the standards related to individual objectivity, proficiency, and due professional care, as well as the standards relevant to the performance of their job responsibilities. Chief audit executives are additionally accountable for the internal audit activity’s overall conformance with the standards. An internal audit organization’s primary purpose is to provide an independent, objective assurance and consulting activity designed to add value and improve an organization’s operations. The IIA Standards ensure credibility in the work of the organization. The standards-based outputs assist an organization to accomplish its objectives by bringing a systematic, disciplined approach to evaluate and improve the effectiveness of risk management, control, and governance processes.

36 D. Rattray

Internal Auditor Skills and Experience As part of the 2017 IPPF Attribute Standard 1200 (see Box 3.3), the IIA sets out skills and competence requirements for internal auditors in the Proficiency Standards that describe how internal audit engagements are to be carried out. The details of the standard set out what is expected of practitioners to meet expectations of proficiency and due care. Internal auditors must have the knowledge and skill required to carry out their work but collectively the whole audit team must have all the knowledge and skill required to carry out the audit. The team should be supplemented if necessary to meet the standard. It is also expected that an assignment should be declined if the team does not have the required skills. There is no mandatory requirement for an internal audit assignment to be led by an accredited internal auditor, for example, an individual holding the IIA Certified Internal Auditor (CIA) designation. However, the IIA states that it strongly prefers this to be so. Internal auditors are expected to have knowledge of current activities and emerging trends, sufficient knowledge to assess the risk of fraud and key information technology, albeit not at the level of an auditor with primary responsibility. Box 3.3 Proficiency and Due Professional Care 1200 – Engagements must be performed with proficiency and due professional care. 1210 – Proficiency Internal auditors must possess the knowledge, skills, and other competencies needed to perform their individual responsibilities. The internal audit activity collectively must possess or obtain the knowledge, skills, and other competencies needed to perform its responsibilities. 1210A1 – The chief audit executive must obtain competent advice and assistance if the internal auditors lack the knowledge, skills, or other competencies needed to perform all or part of the engagement. 1210A2 – Internal auditors must have sufficient knowledge to evaluate the risk of fraud and the manner in which it is managed by the organization, but are not expected to have the expertise of a person whose primary responsibility is detecting and investigating fraud. 1210A3 – Internal auditors must have sufficient knowledge of key information technology risks and controls and available technology-based audit techniques to perform their assigned work. However, not all internal auditors are expected to have the expertise of an internal auditor whose primary responsibility is information technology auditing. 1210C1 – The chief audit executive must decline the consulting engagement or obtain competent advice and assistance if the internal auditors lack the knowledge, skills, or other competencies needed to perform all or part of the engagement. Source: Institute for Internal Auditors. International Standards for the Professional Practice of Internal Auditing

Internal Audit Defined 37 Staff experience and education requirements for internal auditors are similar to those of external legislative audit staff. They must collectively possess the required skills and experience to conduct the assurance and consulting assignments. The head of the legislative audit organization (e.g., Auditor General) in some jurisdictions is not required to possess a professional accounting or audit designation. There are exceptions however such as in Canada, where the Auditor General Act requires the Auditor General to be a Chartered Professional Accountant (CPA). It is also worth examining the requirement for internal audit in the Canadian Federal Government. The Federal Accountability Act empowered the Federal Government Management Board (Treasury Board of Canada) to act with respect to internal audit in federal public administration. They established policies that required the establishment of internal audit functions in all government departments and agencies, which must comply with the government of Canada's internal audit standards, which are virtually identical to the IIA international standards of the IPPF. For example the Department of National Defense and the Canadian Armed Forces publicly report on how they are performing in areas such as the skills and training of their internal auditors and their risk-based audit plan (see Box 3.4).

Box 3.4 Internal Audit at National Defence and the Canadian Armed Forces in Canada The Treasury Board of Canada has statutory responsibility for the administration and establishment of policies for internal audit. The policy objective is to ensure oversight “informed by a professional and objective internal audit function.” Large departments are required to have an internal audit function, carried out in accordance with the Internal Auditors’ International Professional Practices Framework. They are also expected to support the professional development and certification of internal auditors. To meet these requirements Departments like the Department of National Defense and the Canadian Armed Forces post their compliance to the policy requirements. For example, they post the professional certification of their staff to carry out their risk based audit plan that is also posted. % with internal audit or accounting designations 54% % with internal audit or accounting designation in progress 11% % with other designations 22% They also post the status of their work in their risk based audit plan. It includes not only the audits that are being done such as an audit in progress on Financial Management Control Practices of the Canadian Army but also the advisory work such as the work in progress on Information Management in the Federal Health Claims Process Service Contract. (Source: National Defence and the Canadian Armed Forces, 2018)

38 D. Rattray

Differences Between Internal and External Audit The two functions, while similar in some aspects of practice, are quite different. As discussed in Chapter 2, legislative auditors in national audit offices normally conduct both financial statement as well as performance audits. This is done under various forms of legislation, with reporting generally not to management or boards of directors, but to legislatures. There is therefore less concern about risks to the maintenance of independence and objectivity under these conditions since the auditors are external to the bodies under review. Unlike financial audits, performance audits conducted by external and internal organizations can be performed by staff and heads of these organizations that are not mandated to hold either a CPA or CIA designation. The primary requirement is to collectively possess the required skills and experience to conduct the assignment. Unlike CPA institutes and firms whose membership has a monopoly over the practice of external financial audit in most jurisdictions, this is not the case for internal audit. On occasions this has resulted in challenges around compliance and enforcement of standards by internal auditors who are not accredited by the IIA. The requirement for certified or licensed financial auditors is generally based in legislation for most private sector companies (and all publicly traded companies), and also in legislation for external financial audits conducted by legislative auditors in national audit offices in the public sector for government business enterprises or ministries and their agencies. The internal audit profession, operating in the private, public or not-for-profit sectors, does not operate under a similar monopoly. This is because it is viewed in many sectors and instances as a management option, not a mandatory requirement imposed on the organization by an outside body. Therefore, there are many examples globally where internal audit does not exist in organizations, but where it might well serve a valuable service to management. Justifications provided for the lack of internal audit include the cost of the activity and a lack of understanding of the value-added. Unlike CPA associations worldwide that oversee the practices of their membership, ensure compliance with legislation and set local audit standards, the IIA is one global body operating through affiliates. IIA and its affiliates are responsible for internal audit member certifications, ongoing member professional development, standards-setting, quality assurance, and advocacy, among other responsibilities, similar to CPA, with the exception of compliance with external legislation in most jurisdictions. An exception to this is the case of the federal government level in Canada as noted above. There are differences between internal and external auditors in terms of employment. Internal auditors are employees of the organization and report to senior management, while external auditors are contracted by shareholders or governments for specific audit assignments or on the staff of a National Audit Office. Although external auditors can conduct consulting services for clients, they must take care to maintain their independence and objectivity from management,

Internal Audit Defined 39 particularly to not compromise their external audit work on financial statement assurance engagements. The public and regulators are particularly sensitive to appearances of these two lines of business blurring. Because internal auditors are company employees serving management and the board through the chief audit executive, they must be vigilant about independence and objectivity, but not to the degree necessitated by external auditors, upon whom external parties rely heavily. The annual work plan is normally developed by internal audit through consultation with senior management, and approved by the board or head of the government department or government business enterprise. The external audit financial statement work in most cases is required by legislation, or the ability to identify and carry out the external performance audits is provided for through legislation for national audit offices. National audit offices are normally required to report to legislatures, and according to the IIA Standards, internal audit bodies are similarly required to report to not legislatures but to corporate boards or governing bodies. In both instances, however, there are implied relationships and expectations of co-operation that should be developed between corporate management and internal audit, and legislative auditors and departments and agencies, while respecting independence and objectivity. Internal audit reports are mainly used by management for risk mitigation and systems and practice improvements, while external audits are used by shareholders in the case of public corporations or by elected officials and public stakeholders in the case of legislative auditor’s reports. Internal audit reports are normally private documents, between internal audit and management, while external audit reports are normally public documents, as in the case of financial audit assurance reports or legislative audit reports to legislatures. Internal audits are often ongoing examinations of organizational risk, internal control, and governance practices, and reports are issued periodically throughout the year, external audits are typically conducted and reported upon based on specific statutory provisions. Those who lead and conduct internal auditors are strongly encouraged by the profession to have a professional designation, although this is certainly not mandatory. The work performed (including internal audit assignments) by an external auditor must be led by a professionally qualified practitioner, for financial audit, in virtually all jurisdictions, by a CPA. While both internal and external auditors are required by their professions to be “independent” from management and the organization they audit, the application by both can be quite different. Internal audit can be assigned projects of either an assurance or consulting nature and be asked to provide opinions and other types of advice. In contrast, external auditors must be very cautious of working too closely with their clients because it might jeopardize their independence in reporting publicly on their legislated requirements. Internal and external audit are quite different but in many instances they can and do rely on the work of each other. But both bodies are professionally driven

40 D. Rattray by codes of ethics, professional accreditation of its members (all in the case of CPAs and some in the case of internal auditors), professional auditing standards and quality assurance practices.

Conclusion Internal audit as a professional practice is continuing to develop. It continues its effort to strengthen its profession through more thorough validation and enforcement of mandatory requirements. Management in for-profit, not-for-profit and governments can rely on internal audit organizations to help them safeguard assets, ensure compliance with laws and regulations, as well as recommend improvements to the economy, efficiency and effectiveness of operations. This leads to a demonstrated result in improved profit margins, and in the case of not-for-profit and government departments, more funds for the delivery of services. It has been the experience of the author both domestically and internationally for all audits to operate effectively they must be performed in a business climate or political/government context that embraces constructive criticism. It is in those environments that the most can be gained from the audit.

References International Auditing and Assurance Standards Board (IAASB) (2013). International Standard on Assurance Engagements 3000 (Revised): Assurance Engagements Other than Audits or Reviews of Historical Financial Information. Institute for Internal Auditors (2016). Global Fact Sheet-IIA. Retrieved from https:/ na.theiia.org. Institute for Internal Auditors (2017). International Standards for the Professional Practice of Internal Auditing. Lake Mary, FL: IIA. Government of Canada (1985). Financial Administration Act. R.S.C 1985, c-F11, Section 7 and 11.1. National Defence and the Canadian Armed Forces (2018). Audit and Evaluation Reports. Retrieved from www.forces.gc.ca/en/about-reports-pubs-audit-eval/ocgreportingrequirements.page [Accessed 5 October 2018].

4 Defining Evaluation Jos Vaessen and Maurya West Meiers

This chapter leaves the world of external and internal audit and outlines the foundations of the field of evaluation. This is not an easy task. The practice of evaluation has always struggled with boundary issues as well as divergent views on what evaluation is about and what it is not. In the selection and discussion of evaluation’s building blocks, we implicitly take a stand on what we consider to be the common foundations of evaluation. At the same time, we are quite cognizant of the daunting heterogeneity in evaluation practices across different policy fields, institutional contexts and countries. This is reflected in our review.

A Brief Note on the History of Evaluation There is no full consensus in the evaluation community on the definition of evaluation, the boundaries of evaluation as a field of practice, or the types of knowledge that feed into it (Shaw, Greene, & Mark, 2006). Despite the lack of consensus on definitions, researchers, and theorists of evaluation have posed some defining views. Scriven proposed that, “Evaluation refers to the process of determining, the worth, merit, or value of something” (Scriven, 1991, p. 139). Weiss provided a more circumscribed definition as “the systematic assessment of the operation and/or the outcomes of a program or policy, compared to a set of explicit or implicit standards, as a means of contributing to the improvement of the program or policy” (Weiss, 1998, pp. 4–5). Rossi, Lipsey and Freeman wrote, In its broadest meaning, to evaluate means to ascertain the worth of or to fix a value of some object … we use evaluation in a more restricted sense, as program evaluation or interchangeably as evaluation research, defined as a social science activity directed at collecting, analyzing, interpreting, and communicating information about the workings and effectiveness of social programs. Evaluations are conducted for a variety of practical reasons: to aid in decisions concerning whether programs should be continued, improved, expanded or curtailed; to assess the utility of new programs and initiatives; to increase the effectiveness of program management and administration; and

42 J. Vaessen and M. West Meiers to satisfy the accountability requirements of program sponsors. Evaluations may also contribute to substantive and methodological social science knowledge. (Rossi, Lipsey, & Freeman, 2004, p. 3) Evaluation has a longstanding history in public policy. Weiss (1998) notes that evaluation traces back to the seventeenth century, when in Britain the empirical study of social problems began. She and others suggest that earlier examples could probably be found, if not formally called evaluation. As properly observed by Mark et al. (in Shaw et al., 2006) given the diversity in the field of evaluation (across organizations, sectors, countries) it is more correct to talk about the histories of evaluation rather than one history of evaluation. Prominent examples of systematic evaluation of policy interventions in the field of education and public health date from the first part of the twentieth century (Rossi et al., 2004). The US General Accounting Office – renamed in 2004 as the General Accountability Office – began in 1921 to investigate how federal dollars were spent. It expanded its duties over the years, moving more heavily into evaluation in the 1960s, and today is the preeminent body conducting evaluations in the US government. After World War II, evaluation experienced a boom period in the US and a handful of European countries such as the UK and Sweden (Furubo & Sandahl, 2002). Vedung (2010) calls this the first wave of institutionalization which he refers to as the scientific wave with a strong emphasis on the use of experimental methods. Examples in the US include evaluation in the framework of the War on Poverty and Great Society programs. That is, two-group experimentation (treatment and control), or testing, should occur as the means to reach externally set goals. He wrote that this wave fell out of favor, leading to the second wave involving a more participatory and non-experimental approach. With emphasis placed on eliciting information from stakeholders and other informants, this was the so-called dialog-oriented wave of the 1970s. The 1980s brought a neo- liberal wave with a push for market orientation. Accompanying that wave was an emphasis on deregulation, privatization, contracting out, efficiency, and customer influence. And finally, the fourth wave has brought a return or renaissance for scientific experimentation in the evidence-based wave. This wave has emphasized accountability, customer satisfaction, and value for money. In the 1990s and 2000s, evaluation further expanded its reach in terms of institutionalization of evaluation in public, “not-for-profit,” and private sector organizations across countries. A period of exerted pressure began for more effective use of scarce resources in international development activities. This pressure largely came from development assistance agencies (the World Bank, UN agencies, bilateral agencies, non-governmental organizations), donor country governments (parliaments, legislatures) or other financial supporters (philanthropies, other donors). The higher expectations of effective and efficient use of resources were coupled with demands for better evidence. And so, this meant growth in a variety of monitoring and evaluation activities.

Defining Evaluation 43 In turn, this influenced the growth of monitoring and evaluation (M&E) functions within these development assistance agencies and in the countries where they worked, mostly in lower and middle-income countries. Notable examples of “homegrown” institutionalization of evaluation in public agencies in emerging economies are Colombia, Chile, Mexico, and South Africa (Mackay, 2007, pp. 12–13). Knowledgeable and skilled employees were needed to carry out the work, along with management departments to oversee the work. This led to an increase in the generation of tools, guidance documents, policies, regulations, and in some cases laws related to evaluation activities and requirements. All of this, and more, has led to a wide expansion and presence of evaluators and evaluation activities in almost every country in the world, and at least a minimum level of evaluation activity. It has touched more than national-level activities and efforts. Subnational (province, municipality, village) M&E activities abound, as do those specialized in a growing range of sectors and specializations (urban transportation, HIV/AIDS, museum communities, etc.). Nowadays, one can speak of an international evaluation community. The term is intentionally amorphous and context-specific. But in general, for those working in the still relatively young and evolving evaluation field, there is a strong sense of community, a desire to share knowledge and resources, and to bring one another along. This is not only a feeling within one’s own agency, but with evaluators in cities, countries, regions, and globally, as well as across sectors, agency types, job types, and so on. Technologies have helped to create and support a “sense of community” through email listervs, online communities of practice, webinars, and so on. Professional associations have been, perhaps, the type of community activity that has served to connect individuals and groups most. VOPEs (Voluntary Organizations for Professional Evaluation) are evaluation non-profit membership organizations that are open to individuals interested in evaluation. Typical activities offered by these associations or VOPEs are annual conferences, online platforms, publishing and journal linkages, blogging and knowledge-sharing opportunities, professional learning activities, job boards, and, in some limited cases, access to credentialing and certification. Among the oldest and largest associations, or VOPEs, are the Canadian Evaluation Society (founded in 1980), the American Evaluation Association (begun in 1986 with the merger of two existing associations), the European Evaluation Society (begun in 1992), the United Kingdom Evaluation Society (1994), the Australasian Evaluation Society (1997), the African Evaluation Association (1999) and the South African Monitoring and Evaluation Association (2002, with the original formation in 2002 under another name), among others. In addition, given the cooperative nature of the evaluation community, representatives of some of the earliest associations came together in 1997 to form a network of associations, the International Organization for Co-operation in Evaluation (IOCE). The network has expanded, with funding from member associations, with a mission to increase public awareness and globally validate evaluation, and support VOPEs in contributing to good governance, effective

44 J. Vaessen and M. West Meiers decision-making, and strengthening the role of civil society. EvalPartners, managed by IOCE and UNICEF, formed in 2012 to build on IOCE and networkbuilding efforts. Through the work of these bodies and other partners, further network and association building has occurred and been tracked. Today there are active associations across the globe from the Sri Lanka Evaluation Association, to the Middle East and North Africa Evaluation Network, to the Eurasian Alliance of National Evaluation Associations in the majority of countries across the globe. Another recent evaluation trend worth mentioning involves the growing number of journals and dedicated publications (offline and online) on evaluation. The larger associations – such as the Canadian Evaluation Society, the European Evaluation Society, and the American Evaluation Association – have well established peer-reviewed journals. Article authors lean in the direction of academic affiliations, but the journals also enjoy submissions from evaluation practitioners across employment types. In addition to these longstanding journals, there are newer journals usually also with an association affiliation. The review requirements tend to be more relaxed in the newer journals, but they have provided a valuable contribution to broadening the base of knowledge and expanding approaches, particularly outside of North America and Europe. Beyond journals, there is no shortage of books published on technical and practical areas of evaluation. The books published today have moved beyond the fundamental topics of evaluation that very much align with what is found in a typical graduate school research methods textbook. Now there are numerous books to be found on topics such as participatory methods, realist evaluation, evaluation of complex systems, and so on. And finally, beyond the world of journals and books, there are numerous online platforms, where users can find curated overviews of evaluation topics and contribute to them. One final development in the field involves the growth in graduate programs and professional training opportunities. It is the very rare person working in evaluation today who has a university or graduate degree with the term “evaluation” on the diploma, as the clear majority of evaluators come from a range of disciplines such as public policy, sociology, psychology, among many others. But change is coming, with graduate programs offering degrees in evaluation, often a trans-disciplinary degree (including at the PhD level). These graduate programs are still relatively few though, and are found mostly in Northern America, Western Europe, Australia, and a handful of other locations. While graduate degree programs are slower to develop, in part because of the rigorous process that universities have to approve new programs, professional training programs have swelled since 2000. There is no shortage of one- to fiveday courses on everything from fundamental topics to more specialized issues. They can be found in cities from Washington in English, to Brasilia in Portuguese, to Hanoi in Vietnamese. There are no governing bodies for these professional courses, and they do vary in terms of quality and rigor, but they are serving to build knowledge and skills among professionals working in evaluation.

Defining Evaluation 45

Evaluation as a Function and an Institutionalized Practice The institutionalization of evaluation has not been a linear process and it has evolved quite differently across countries, policy fields and types of organizations (see for example, Furubo, Rist, & Sandahl, 2002; Jacob, Speer, & Furubo, 2015). Evaluation functions and practices differ in terms of how they are linked to organizational decision-making and learning. To discuss this, we first look at the concept of independence. Evaluation independence is in place when the evaluation process is free from undue political influence and organizational pressure. Independence can be achieved through various mechanisms. Structural independence is ensured when the evaluation function has its own budget, staffing and work plan that is not subject to approval by management but directly under the supervision of an external entity such as a board, council or parliament. Functional independence refers to the ability of the unit managing the evaluation to decide on what to evaluate and how to go about the evaluation. Finally, behavioral independence implies professional integrity and absence of bias in the attitude and behavioral conduct of the evaluator. Depending on the type of evaluation carried out, the level of independence varies. Independent evaluation functions, often found in international development (in the majority of multilateral and bilateral organizations, in some foundations and other non-governmental organizations) but also in national government (including national courts of audit), report to an overarching entity above management. Decisions regarding budgets and human resources are taken independently from management. Consequently, they can be characterized as structurally, functionally, and behaviorally independent. Probably a majority of evaluation functions are not structurally independent of management but still have a lot of leeway in terms of determining what and how to evaluate (functional and behavioral independence). Finally, much evaluative activity is embedded in operational practices of different types of organizations. Often, there is no sharp distinction between the practice of evaluation and other practices such as research or monitoring. Evaluation may be closely linked to project design and be part of some type of M&E framework at the project or program level. In this category of evaluative work, evaluation may be managed or conducted by professionals who may not see themselves as evaluators. The principle of behavioral independence may apply in these cases. The diversity in the embeddedness of evaluation in organizational processes also points to a variety of actors involved in the evaluative process. On the supply side, evaluations may be commissioned by a decision-making organ or the evaluation unit itself. Evaluations may be managed by professional evaluation managers or conducted by professional evaluators located within the institution. Evaluations may be handled on an “in-house” basis, meaning that only evaluators who are staff members of the institution manage and conduct evaluations. In most cases evaluations are conducted on a “hybrid” basis (“in-house” evaluators collaborating with external experts) or are fully commissioned to individual

46 J. Vaessen and M. West Meiers external experts or firms. In smaller organizations or those lacking a formal evaluation function, similar patterns can be found. A main difference may be the level of formality and structure regarding the management and implementation of the evaluation process. On the demand side of evaluations, the range of actors can be quite broad, e.g., decision-makers (in management or at the political level); operational managers and staff; funding and implementing partners; interest groups, watchdogs, and the media; organizations and individuals who directly or indirectly are affected by the intervention under evaluation; the general public. On the supply side, a useful way to characterize the main actors would be: decision-makers, evaluation managers or task leaders who commission evaluations; “in-house” evaluators who are part of some type of designated evaluation function, who manage and/or conduct evaluations and identify as professional evaluators; staff who are involved part-time with conducting and/or managing evaluations and who may or may not identify as evaluators; external experts who are either self-employed or work at specialized consulting firms and who may or may not identify as evaluators; academics who are involved part-time in evaluative activities and who usually do not identify as evaluators. While those in academia might be more interested and equipped in researching and writing about evaluation than others in the field, they are equally likely to be engaged in advising government agencies on evaluation methodologies or, indeed, leading large-scale evaluations for those same agencies. Consulting firms specializing in evaluation have multiplied globally in the last 20 years. The firms would include the major accounting firms with teams having evaluation specializations, to more boutique firms of some hundreds of employees to only a handful of employees. Individual evaluation consultants are very common. Within larger agencies – government, multilateral, NGOs, foundations – it is common now to have “in-house” evaluators specializing only in evaluation working in small units or large departments. Depending on the organizational context, they might be involved in a range of functions, from influencing the design of programs and policies, to monitoring and quality assurance, to ex-ante and ex-post design of evaluations. There is no single model that exists across large agencies. The combination of functions is also dependent on the department’s reporting arrangements (internal to independent from management), relationship with the day-to-day operations (embedded to distant), and existence of other units doing related functions in the same large agency (e.g., perhaps the agency has a quality assurance unit, a monitoring unit, an independent evaluation unit (structurally, functionally, and behaviorally independent), and a separate evaluation research team). Clearly, the diversity in actors is associated with a diversity in evaluative exercises and a range of skills and competencies that apply to evaluative practice. There is no such thing as one “type” of evaluator who is able to conduct all types of evaluations or all types of activities within an evaluation. In that sense, variation in the skills and competencies that apply to evaluative analysis is substantially

Defining Evaluation 47 larger than in audit. In the subsequent sections we further elucidate why this is the case.

Evaluation Purpose, Scope, and Approach Evaluations are conducted for different purposes. The most frequent distinction made is between accountability and learning. Accountability concerns the provision of evaluative evidence on an intervention’s merit and worth, often though the assessment of performance and results and the contribution of an intervention to the delivery of an institutional mandate and the achievement of societal goals. Learning concerns the knowledge creation, sharing, and absorption resulting from an evaluation process or product. Accountability closely relates to the assessment of performance and results. The evaluative findings, underpinned by the necessary evidence, are brought to the attention of various stakeholders, including the general public, and support the accountability process of the financing or implementing institution. Various other purposes may be found in the literature and practice, although they tend to overlap with these two main ones. For example, evaluations are often conducted for: decision-making or managerial purposes, oversight and compliance, for motivational and stakeholder ownership purposes (e.g., of a particular program), transparency, knowledge generation (e.g., knowledge as a public good), or organizational improvement. Evaluation covers a diverse range of practices as well as evaluands. Depending on the country, sectoral or organizational context, it encompasses such practices as ex-ante assessments of programs or projects, assessments of project implementation processes and output delivery, project completion assessments, impact evaluations, as well as higher-level programmatic, thematic or corporate assessments of merit and worth. Various useful distinctions can be made to sketch out the breadth and variation in evaluative work. Nowadays, most of mainstream evaluation tends to be associated with the evaluative analysis of an ongoing or completed policy intervention (rather than ex-ante assessment) and often focuses either on the implementation and output delivery on the one hand or on the outcomes and impact on the other. The different epistemological schools of thought that underpin the field of evaluation practices (see for example Pawson & Tilley, 1997) are associated with various theories and practices of evaluation. Interesting attempts to summarize these can be found in Alkin and Christie (2004) and more applied perspectives can be found in e.g., Stufflebeam and Shinkfield, (2007, on evaluation models) or Leeuw and Furubo (2008, on evaluation systems). It is beyond the scope of this chapter to discuss these. From a practice-oriented view it is useful to briefly discuss the following frameworks and approaches. First, an often-used distinction in evaluative practice concerns formative versus summative evaluation. The former tends to be associated with the purposes of program improvement or learning and the focus is on the role and participation of stakeholders in the evaluative process. The

48 J. Vaessen and M. West Meiers latter by contrast is associated with the idea of evaluation as an independent objective assessment to determine the merit and worth of an evaluand. A second prevalent distinction used in evaluative practice refers to goal-oriented versus goal-free evaluation. Goal-oriented (or objectives-based) evaluation concerns the practice where the evaluation scope is bound by the intended goals of the intervention. Goal-free evaluation will explore the potential full range of causal processes induced by an intervention and consequently a range of outcomes that goes beyond the intended objectives of a program of projects. Finally, evaluation practices are often scoped around evaluation criteria. A widely used framework concerns the OECD-DAC (Development Assistance Committee) evaluation criteria, which originated and is mostly applied in the field of international development yet has also found resonance in other policy fields. The five OECD-DAC criteria are relevance, efficiency, effectiveness, impact, and sustainability (for definitions see OECD-DAC, 1991, 2000). While still quite prominent, the definitions of the criteria are somewhat dated and over time debates and practices have broadened to include new criteria such as policy coherence.

Quality in Evaluation Discussions about quality in evaluation are as old as the discipline of evaluation itself (see for example Schwartz & Mayne, 2005). They often become manifest when stakeholders disagree with the findings and recommendations of a particular evaluation. Quality is also at the core of recent (and not so recent) epistemological debates around what is considered to be “good evidence” in evaluation. What most of these debates have in common, apart from passionate exchanges based on thought-provoking arguments, is a rather limited focus (either intended or unintended) on what constitutes quality in evaluation. It is useful to discuss quality in evaluation from multiple perspectives. A first perspective concerns quality from an evaluation function perspective, which can be usefully summarized by the trinity of Independence, Utility, and Credibility (see for example United Nations Evaluation Group (UNEG), 2011). Independence is discussed above. Credibility refers to the evaluation process and output being as unbiased, ethical and professional as possible and grounded in a rigorous methodology. Finally, Utility is about the relevance and timeliness of evaluation processes and findings to organizational learning, decision-making and accountability for different types of stakeholders. Obviously, in a sense these are “meta-principles” which need to be further unpacked. Several frameworks have been developed that do this with varying degrees of alignment to the abovementioned principles (see for example American Evaluation Association (AEA), 2018; ECG, 2012; UNEG, 2016; OECD-DAC, 2010). Quality from an institutional enabling environment perspective is closely related to the construct of “evaluation culture” in a particular institutional environment; to what extent do suppliers and users of evaluation understand and value evaluation as a source of evidence or a process that informs learning and

Defining Evaluation 49 accountability processes? It relates to such aspects as the incentives and attitudes of potential evaluation users toward using evaluations, as well as the incentives, resources, and attitudes of evaluators toward conducting evaluations. It concerns both the enabling environment within the evaluation function (department, unit, office) as well as the broader enabling environment within the institution. Second, the budget, time, data, and resource constraints that shape the opportunity space for individual evaluations constitute another important aspect (see Bamberger, Rugh, & Mabry, 2011). Quality from a human resource perspective is a key entry point for looking at quality in evaluation planning and management. What are the key competencies that are required for successful evaluation? In recent years, there have been increasing discussions around evaluation as a profession, a discipline or a transdiscipline (see Vaessen & Leeuw, 2009; Stockmann & Meyer, 2016) as well as related topics such as professional standards and certification. Opinions are divided as to what extent the concept of evaluation as a profession applies or is even meaningful. Different frameworks for professional standards have been developed (see above) and some certification programs have been established. Yet, no consensus exists. In this chapter, we emphasize the empirical reality that evaluation is increasingly about teamwork, with teams being composed of specialized staff many of whom would not see themselves as evaluators. The discussion of team composition and the different knowledge funds that are brought to bear on an evaluation exercise is an important one (see for example Jacob, 2008). Potentially, each evaluation through its team composition should cover the following knowledge funds and skills sets: substantive knowledge (e.g., of the policy field, theme, or nature of the intervention); context-related knowledge (e.g., of the country, beneficiaries); institutional knowledge (e.g., of the operational and decision-making environment of the commissioning institution(s)); knowledge of evaluation methods; communication skills (especially pertinent in inter-cultural and politically sensitive environments); and project management skills. In addition, the underlying experience of team members in applying these different types of knowledge/ skills in evaluation-related exercises is of key importance. Finally, adequate variation in disciplinary backgrounds and knowledge of relevant substantive theories in the behavioral and social sciences is likely to be valuable. Apart from the different knowledge funds that apply to evaluative analysis, the issue of behavioral independence (discussed above) is also relevant in this context. This aspect also closely relates to the importance of an evaluator’s adherence to ethical norms of conduct and the ability to speak truth to power (Wildavsky, 1979). Often, the default dimension when discussing quality in evaluation is around quality from a methodological perspective. There are multiple views in the social and behavioral sciences on assessing quality from a methodological perspective. In evaluation we sometimes refer to the Campbellian framework on validity (i.e., internal, external, construct, statistical conclusion validity; Cook and Campbell, 1979). We prefer the recent somewhat more eclectic interpretation by

50 J. Vaessen and M. West Meiers Hedges (2017) who refers to the principle of data analysis validity instead of the more narrowly defined statistical conclusion validity. Validity is a property of findings and each validity dimension is underpinned by a set of principles to guide the design, data collection, and analysis of the evaluation. Reliability is another important dimension that directly refers to the research process. In principle, reliable research refers to the idea that if one repeated the analysis it would lead to the same findings. Even though replicability would be too ambitious a goal in many (especially multi-level, multi-site, multi-level) evaluative exercises, at the very least transparency and clarity on research design (e.g., methods choice, selection/sampling) should be ensured to enhance the verifiability and defensibility of knowledge claims. A third dimension is consistency, which refers to the need for a logical flow between evaluation rationale, questions, design and methods choice, actual data collection and analysis, findings and recommendations. A final and fourth dimension concerns the importance of focus in evaluation. A perennial challenge in evaluation is to balance breadth and depth of analysis. Evaluations often broaden their scope (usually) for accountability purposes (e.g., by adding more evaluation questions or dimensions of interest), thereby sacrificing depth of analysis. One could argue that for accountability (and learning) purposes evaluators should focus their evaluations as much as possible by carefully managing the size and complexity of the evaluand and the number of questions posed, so as to concentrate the limited financial and human resources on in-depth analysis and assessment. The four concepts together – focus, consistency, reliability, and validity – constitute a useful lens for looking at methodological quality in programmatic, thematic or corporate process evaluations, for example. Finally, the processes put in place to incentivize and guide quality evaluation are important. This includes the use of proper quality assurance mechanisms such as evaluation peer review, reference groups, stakeholder consultation, and meta-evaluation with feedback loops. Finally, there is an important strand in evaluative thinking and practice that perceives quality first and foremost as a property of being fit for purpose from a utilization perspective. In other words, quality is interpreted from the perspective of how and to what extent the evaluation meets the (information) needs of (different) groups of users (Patton, 2001). While the use of evaluation for learning and accountability purposes is often associated with the quality of the evaluation report, in many cases the quality of process (and the inclusion of stakeholders) can be of equal or higher importance to optimize ownership of and learning from evaluation. Utilization-focused evaluation and related types of participatory evaluation (Cousins & Whitmore, 1998) emphasize the importance of stakeholder involvement and iterative learning through evaluation as the foundation of utilization, and by implication quality. Of particular interest is also the so-called whole-of-process-approach to optimizing evaluation use. The basic premise is that throughout the entire evaluation process one can improve specific aspects to optimize the likelihood of effective

Defining Evaluation 51 use of evaluations. For example, in the planning and design phase, issues such as what to evaluate and which stakeholders to involve with what modalities of consultation/participation would be important to consider. In the implementation phase, the use of credible methods and adequate expertise are examples of aspects to strengthen the rigor and depth of analysis and consequently the learning potential from an evaluation. Finally, in the reporting and dissemination phase it would be important to consider among other things multiple channels targeted to specific audiences and to have clear ideas about follow-up trajectories. More fundamentally, the framework also helps thinking about what types of evaluation (in terms of resources, scope, overall approach, timing, and so on) may be optimal for strengthening evaluation use among different stakeholder groups. While these principles may not significantly change the operations of an institution in the short-term, they will help nudge the institution toward a more utilization-oriented approach to evaluation.

Concluding Remarks Evaluation has spread across the globe as a field of practices, and is increasingly recognized as a discipline, or even a profession (by some). The nature of institutionalization of evaluation varies widely across institutional contexts, policy fields, and countries. It is more than likely that ongoing processes of institutionalization of evaluation in public policy and beyond will lead to further consolidation of core principles, standards, approaches and methods around evaluation. The ongoing professionalization of evaluation will help strengthen the role of evaluation as a function and process to support learning and accountability processes. Common standards and quality assurance mechanisms increasingly apply to a core set of principles, processes, and approaches that apply to evaluation across the globe. At the same time, many aspects of evaluative inquiry cannot and should not be undertaken by evaluators alone. Different knowledge funds (around policy interventions, policy fields, methodological innovations, and new ways of using data) that are relevant in the context of evaluative inquiry will require expertise from different types of professionals. Given the above, an informed approach to strengthening quality in evaluation in any organization would benefit from a multi-dimensional perspective.

References Alkin, M., & Christie, C. (2004). An Evaluation Theory Tree. In M.C. Alkin (Ed.), Evaluation Roots (pp. 13–65). Thousand Oaks, CA: Sage. doi: 10.4135/9781412984157. American Evaluation Association (2018). American Evaluation Association Guiding Principles for Evaluators. Retrieved from www.eval.org/p/cm/ld/fid=51. Bamberger, M., Rugh, J., & Mabry, L. (2011). Realworld Evaluation: Working Under Budget, Time, Data, and Political Constraints, (2nd ed.). Thousand Oaks, CA: Sage. Cook, T.D., & Campbell, D.T. (1979). Quasi-Experimentation: Design and Analysis for Field Settings. Chicago, IL: Rand McNally.

52 J. Vaessen and M. West Meiers Cousins, J.B., & Whitmore, E. (1998). Framing Participatory Evaluation. In E. Whitmore (Ed.), Understanding and Practicing Participatory Evaluation, New Directions for Evaluation, 80. San Francisco, CA: Jossey-Bass. Evaluation Co-operation Group (ECG) (2012). ECG Big Book on Evaluation Good Practice Standards. Retrieved from www.ecgnet.org/document/ecg-big-book-good-practicestandards. Furubo, J.-E., Rist, R. C., & Sandahl, R. (2002). A Diffusion Perspective on Global Developments in Evaluation. In J.-E. Furubo, R.C. Rist, & R. Sandahl (Eds.), International Atlas of Evaluation. London: Transaction Publishers. Hedges, L.V. (2017). Design of Empirical Research. In: R. Coe, M. Waring, L.V. Hedges, & J. Arthur (Eds.), Research Methods and Methodologies in Education. Thousand Oaks, CA: Sage. Jacob, S. (2008). Cross-Disciplinarization: A New Talisman for Evaluation? American Journal of Evaluation, 29(2): 175–194. Jacob, S., Speer, S., & Furubo, J.-E. (2015). The Institutionalization of Evaluation Matters: Updating the International Atlas Of Evaluation 10 Years later. Evaluation, 21(1): 6–31. doi: 10.1177/1356389014564248. Leeuw, Frans L. and Furubo, Jan-Eric (2008) Evaluation Systems: What are They and Why Study Them? Evaluation, 14(2): 157–169. Mackay, K. (2007). How to Build M&E Systems to Support Better Government. Washington, DC: World Bank. http://documents.worldbank.org/curated/en/689011468763508573/Howto-build-M-E-systems-to-support-better-government. Organisation for Economic Co-operation and Development (OECD) Development Effectiveness Committee (DAC) (1991). The DAC Principles for the Evaluation of Development Assistance. Retrieved from www.oecd.org/dac/evaluation/50584880.pdf. Organisation for Economic Co-operation and Development (OECD) Development Effectiveness Committee (DAC) (2000). Glossary of Evaluation and Results Based Management (RBM) Terms. Retrieved from www.oecd.org/dac/evaluation/2754804.pdf. Organisation for Economic Co-operation and Development (OECD) Development Effectiveness Committee (DAC) (2010). Quality standards for development evaluation. Retrieved from www.oecd.org/development/evaluation/qualitystandards.pdf. Patton, M.Q. (2001). Use as a Criterion of Quality in Evaluation. In: A. Benson, D.M. Hinn, & C. Lloyd (Eds.), Visions of quality: How evaluators define, understand and represent program quality, Advances in Program Evaluation, 8: 155–180. Pawson, R.& Tilley, N. (1997). Realistic Evaluation, London: Sage. Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A Systematic Approach. Thousand Oaks, CA: Sage. Schwartz, R., & Mayne J. (2005). Assuring the Quality of Evaluative Information: Theory And Practice, Evaluation and Program Planning, 28(1): 1–14. Scriven, M. (1991). Evaluation Thesaurus (4th ed.). Thousand Oaks, CA: Sage. Shaw, I., Greene, J., & Mark, M (2006). The SAGE Handbook of Evaluation. Thousand Oaks, CA: Sage. Stufflebeam, D.L., & Shinkfield, A.J. (2007). Evaluation Theory, Models, and Applications (Chapter 19). San Francisco, CA: Jossey-Bass. Stockmann, R., & Meyer, W. (eds.) (2016). The Future of Evaluation – Global Trends, New Challenges, Shared Perspectives. London: Palgrave Macmillan UK. United Nations Evaluation Group (UNEG) (2011). UNEG framework for professional peer reviews of the evaluation function of UN organizations, New York, NY. Retrieved from www.uneval.org/document/detail/945.

Defining Evaluation 53 United Nations Evaluation Group (UNEG) (2016). Norms and Standards for Evaluation. New York, NY: UNEG. Retrieved from www.uneval.org/document/detail/1914. Vaessen, J., & F.L. Leeuw (Eds.) (2009). Mind the Gap: Perspectives on Policy Evaluation and the Social Sciences. New Brunswick, NJ: Transaction Publishers. Vedung, E. (2010). Four Waves of Evaluation Diffusion. Evaluation, 16(3): 263–277. doi: 10.1177/1356389010372452. Weiss, C. (1998). Evaluation (2nd ed.). London: Pearson. Wildavsky, A. (1979). Speaking Truth to Power: The Art and Craft of Policy Analysis, Boston, MA: Little Brown.

5 The Practices of Audit and Evaluation Similarities and Differences Maria Barrados

Introduction As reflected in Chapters 2, 3, and 4 there is an extensive literature dealing with each of the separate practices being examined in this book. There is less of a literature on comparing performance audit, internal audit, and evaluation practices. Some early papers comparing performance audit and evaluation suggested that there was a move to greater closeness (Chelimsky, 1996) or that they were “two sides of the same coin” in supporting public-sector accountability (Fraser, 2005). Today, the interest is more in how there can be greater collaboration between the three practices. The International Organization of Supreme Audit Institutions (INTOSAI) issued a Primer on Program Evaluation (2010) that noted the marked differences between evaluation and traditional audit practices but at the same time saw the opportunity that evaluation particularly provides to supplement performance audit. As shown in Chapter 4, evaluation can touch many forms of inquiry. Similarly, the European Court of Auditors describes the differences and similarities between performance audit and evaluation as more of a function of the context in which they take place and their purpose rather than activities, knowledge skills, and methods themselves (European Court of Auditors, 2017). The United Nations went further than seeking collaboration. They sought to promote effective coordination and co-operation between the groups to avoid duplication and to create synergy by consolidating its internal oversight functions of audit, inspection, investigation, and evaluation into the Internal Oversight Service in 1994. The Joint Inspection Unit of the United Nations examined the oversight functions in the UN system and recommended a consolidation of oversight activities in UN organizations (Joint Inspection Unit (JIU), 2016, pp. 4–9). Some, but not all, UN organizations have had a similar consolidation. Chapters 12 and 13 describe the experiences. As the chapter authors in this section of the book have shown in describing the main features of each of the practices, they are distinct activities and not wholly interchangeable. Yet aspects of their work can be similar depending on the topic that is being examined and how the results are reported. As the European Court of Auditors have noted, it is the context in which the functions are located,

The Practices of Audit and Evaluation 55 as well as the purpose of the task, that result in some significant differences at the same time as creating many practice similarities. While each practice is distinctive, which provides potential barriers, there are areas of commonalities supporting greater collaboration. Further, the experience and literature of each practice illustrates a great range and variety of international and domestic practices. This chapter examines the factors that enhance or hinder full collaboration of the practices. Our analysis is based on the work in Chapters 2, 3, and 4 and the matrix of defining attributes by contributing authors are described in this chapter.

What is Defining and Unique? Origin and Practice Traditions As we have seen, there are generally-accepted definitions of different types of audit. Performance audit and internal audit, as forms of audit, are as defined by their standards and practice guides. In contrast, as pointed out in Chapter 4 there is no consensus on a definition of evaluation. Performance audit and internal audit share a common tradition rooted initially in accounting; internal control was mostly a financial process and the subject of financial audit and some internal audit. In some countries, management controls have also been examined in performance audit, leading to an orientation of determining compliance with statutes, regulations, policies, and procedures (Ruffiner & Sevilla, 2004). In marked contrast, as illustrated in Figure 5.1, evaluation does not come from a single tradition. Instead, it comes from a multidisciplinary, social science research tradition. In the practice traditions there is very little convergence of audit and evaluation. Audit traditionally is more compliance-oriented and part of regulatory supervision or oversight regimes, and can provide formal assurance. In contrast, evaluation is generally more often seen as a function supporting understanding and learning, as well as accountability which can support oversight. Figure 5.1 illustrates the comparative presence of accounting traditions, applied social science, a compliance focus, and oversight and assurance. The factors were weighted by the presence and degree of similarity of the factors. Professionalization There is academic debate about the definition of professionalization (Saks, 2012, p. 1). A neo-Weberian definition of professionalism being “centered on attaining a particular form of legal regulation with registers creating bodies of insiders and excluding outsiders” can be applied to our comparison. As shown in our analysis, against this definition, the accountant-auditors are the most professionalized group, followed by internal auditors and lastly evaluation. As noted in Chapter 4 different frameworks for professional standards have been developed in evaluation and some certification programs have been established but no consensus exists.

56 M. Barrados

Evaluation Accounting Root Compliance Focus

Internal Audit

Oversight Function Assurance Provider Social Science

Performance Audit 0

5

10

15

20

Figure 5.1 Practice Traditions.1

Both forms of audit conduct their practices with independently set standards that are supported by policies and guides. As described in Chapters 2 and 3 performance audit and internal audit standards draw heavily on financial audit standards. There are separate international and national performance audit standards which are often a sub-section of standards for all audits. The national standards tend to be consistent with the international standards, but may have some unique features which reflect the particular way in which performance audit has developed in a specific country. Internal audit has a single set of international and national standards. Alongside their standards both forms of audit have codes of conduct and quality assurance requirements. Peer reviews are used differently by all three practices. Only internal audit standards require an external quality review for independent validation once every five years. The internal audit standards are similar in structure and content to performance audit standards. Many of the practitioners of performance audit and internal audit are licensed accountants. For these professionals, adherence to standards (audit standards including performance audit but not internal audit) is one of the requirements to maintain their professional license. Internal audit does not have a system of licensing, but does offer certification which is preferred but not obligatory for practitioners. Most performance audits are conducted in national audit offices that traditionally have responsibilities for financial audits, are staffed with financial auditors and are often but not always headed by professional accountants (exceptions include Sweden, which removed financial audit from the responsibility of the National Audit Office in 1967, and the UK prior to 2009). The requirement to meet professional standards are part of their professional license obligations. For

The Practices of Audit and Evaluation 57 many of these external audit offices much of their work is still financial, although the standards for both financial and performance audit nonetheless define their practices. As noted in Chapter 2, performance audit has become increasingly multidisciplinary. The non-accountants still need to meet the same standards of practice. They will not have had the formal training of their accounting colleagues but they will still need to learn about the standards and how they are applied. This is usually done through training and on-the-job supervised experience. The standards and policies of the two types of audit with related licensing or certification define these practices. Evaluation, consistent with its multidisciplinary roots, draws on different social science disciplines for standards and policies. Different evaluation societies have set out general standards and codes of conduct. These are less prescriptive than those for audit and are more flexible in the management of the practice. Evaluators also draw on qualification standards from their respective, mostly social science, disciplines. As described in Chapter 4, evaluation is thought of more as a discipline with dedicated academic study and contributions to peer-reviewed journals. Practitioners are members of evaluation associations and part of national and international networks. There are different frameworks for professional standards and some certification programs, but with no consensus that would define the practice. Evaluators are still debating whether evaluation is, or should be, a separate profession. The contrast with audit on the dimensions of the presence of regulated standards, external policies, internal policies, licensing or certification, peer review, and codes of conduct is illustrated in Figure 5.2.

Evaluation Regulated Standards Policies External Policies Internal

Internal Audit

Licensing/Certification Code of Conduct Peer Review

Performance Audit

0

5

10

Figure 5.2 Professionalization.1

15

20

25

58 M. Barrados Organizational Setting and Products A third area that distinguishes the three practices is their organizational setting, their products and the nature of their accountability relationships. There are two striking differentiators for internal audit. First, internal audit with single international and domestic standards is practiced in the public and private sector even though our analysis in this volume deals with internal audit in the public sector. Second, internal audit, unlike most performance audit, is located within organizations, departments or ministries with links to independent audit committees or boards if they are present. Audit is traditionally imposed on existing accountability relationships. The responsibility for managing and delivery rest with government employees. Auditors carry out their work independently and report their results to a directing authority. In the case of performance auditors this is usually a parliament or legislature, and in the case of internal audit, an independent audit committee or board if there is one, or the senior management or specific clients of the organization. Evaluation is imposed on the same accountability relationship and like internal audit is located mostly within government. However, it is more likely to report to senior management or specific clients, but there are also cases where it reports to an independent committee or board. Performance audit and evaluation are primarily government functions. Performance audit is usually located in an independent organization such as a national audit office that is not under the direction of government ministers, but rather under the direction of a parliamentary or legislative body. The performance auditor’s accountability is to the parliament or a legislative body. Under some circumstances a special report may be done for government. However, reporting is usually done directly to the legislative body through a public process. These offices have as a result a high degree of public visibility. Internal audit and evaluation units are most often located within government organizations. They can be co-located but work under their own standards and policies. In the case of evaluation, these policies can often be specific to the government, but supplemented by norms and standards developed by specialty evaluation areas; for example, evaluation in a Development Agency would be influenced by norms and standards developed by the OECD Development Assistance Committee Network on Development Evaluation (Morra Imas & Rist, 2009, p. 509). Figure 5.3 illustrates the presence of reporting relations to parliament, government, clients or boards weighted by presence and similarity of factors The products produced by the three practices also differentiate them. Because of their similar standards audit reports will have similar components. Performance Audits, guided by the interests of legislatures, will tend to be less operational and broader in scope than internal audits. Evaluations will tend to look more like research reports, and will tend to draw on their own standards and disciplines for methodologies and technical terminology. Apart from the technical audit or evaluation report all three activities provide other forms of information either in the form of advice, information or studies.

The Practices of Audit and Evaluation 59

Evaluation

To Parliament

Internal Audit

To government To Client or Board

Performance Audit 0

2

4

6

8

10

Figure 5.3 Reporting Relationships.1

National audit offices produce more than traditional performance audits to report results, and can be seen to be developing “guidance, manuals, information seminars and testimonials” (OECD, 2016). Evaluation and internal audit, as provided for in its standards, can undertake consulting assignments for management. This is a practice that is not found in performance audit, and is viewed as potentially compromising institutional independence.

What is Shared? Even though there are areas of distinctiveness, both types of audit share common traditions and approaches to professional regulations. All three practices are concerned with some form of effectiveness, while internal audit and evaluation are both internal to government functions that can be co-located. Looking in more detail at the work itself makes the basis for sharing even more evident. The matrix of defining general attributes developed for this chapter shows many similarities in shared practice principles and elements, areas of work, and shared challenges. Appendix A provides a list of the general attributes identified. As illustrated in other Chapters of this book, the settings and context for each practice can, however, result in significant differences in interpretation and application. Shared Practice Principles All three activities within their own context share preoccupations with independence, rigor, quality, and utility of their work. For evaluators and auditors, the reliability and rigor (validity) of their work is essential to building their reputation so that the results of their work can be used within the constraints provided, to

60 M. Barrados inform their client’s work and decision-making. For audit work, quality is defined through the standards and the steps taken to ensure adherence to standards. Evaluators as discussed in Chapter 4 have different perspectives and definitions of quality reflecting the multiple views of quality in social and behavioral sciences. Evaluators are preoccupied with quality but in a less prescribed way than auditors. National audit offices have a strong emphasis on ensuring that practices and procedures are in place to ensure adherence to standards and ensuring quality in the reports. Extensive measures have been introduced to allow audit offices to demonstrate adherence to quality standards. This includes an elaborate documentation of each sentence in a final report so as to provide confirmation that each statement is correct, as well as engaging external academics and advisors to review published reports or advise on methodologies. The use of internal peer review, and in some cases, independent assessment of published reports by experts, are also features of performance audit. All three activities also share an interest in the exercise of professional judgment, serving the public interest, having qualified practitioners, relying on others, paying heed to ethics and confidentiality, focusing on maintaining credibility and engaging stakeholders. Evaluation is particularly strong in the latter. The credibility of the audit and evaluation products extends beyond the credibility of the individual product but extends to the function and the institution. Both are threatened by failure, but the visibility and potential impact of the failure can be greater for performance audit, with the result that national audit offices have put a great deal of effort in protecting the quality of their products. Areas of Work Many areas of work are shared, and all three examine issues of effectiveness, relevance, data quality and performance measurement. INTOSAI (2010) suggested marked overlap in performance auditing guidelines and applied social science methods used in evaluation, as shown in Figure 5.4. This would also apply to internal audit for their performance audit work. Our analysis shows that audit will be much more likely than evaluation to examine issues of economy, efficiency, good practice, effective regulation, as well as compliance and management controls. Internal audit is more likely to do work in safeguarding assets, internal controls, and detecting fraud than performance audit or evaluation. The work of internal audit is much more operational, often working to a level of detail that would not be featured in the work of performance auditors. Evaluation is more likely to examine issues of outcomes and impact. Auditors and evaluators both work to plans, often multi-year, that may need to be adjusted. Evaluators and internal auditors’ planning approach is defined by their priorities and the policies they operate under. Unlike performance audit, consultation with management is an important part of developing internal audit and evaluation plans. Internal auditors’ policies, unlike evaluation, stress the

The Practices of Audit and Evaluation 61

Performance audit

Program evaluation

Auditing guidelines

Applied social science methods

Figure 5.4 Interrelationships Between Performance Audit and Program Evaluation. Source: Program Evaluation for SAIs A Primer (INTOSAI:2010, p. 7).

importance of assessing risk and risk management. National audit offices can only cover a limited number of topics a year and often have planning processes based on risk for prioritizing from among a wide range of possibilities. Fundamentally, the work that is done is driven by the objectives of the audit or evaluation, or for some, the question that is being asked by a legislature, manager or board. While all three practices may examine parts of a broad area such as effectiveness, it does not mean that all aspects would be examined. For example, all three practices could, as part of their work, include some aspect of determining the worth or significance of an activity, policy or program as part of the definition of evaluation suggested by Morra Imas and Rist (2009). All would collect evidence and likely conduct interviews. An audit would more likely seek to rely on the data collected by others and assess whether they can rely on its robustness. Evaluations are more likely to undertake the design and collection of new data. In the broad area of effectiveness, audit tends to deal with operational effectiveness while evaluations are much more likely to deal with outcomes and impacts of policies and programs. There are also elements of commonality in audit reports and evaluations. Both can report against objectives, collect evidence, conduct interviews, draw conclusions, and make recommendations. Auditors are more likely than evaluators to report their findings against agreed criteria, do file reviews, rely on existing data, describe policy and risk (in order to explain the policy and administrative context), and identity best practice. Evaluation stands out in being oriented to collecting new data if existing data are not available to deal with the evaluation question. Furthermore, evaluation attempts through its design and analytical techniques to establish empirical causality between relationships.

62 M. Barrados Shared Challenges All three practices identify numerous common challenges that include reputational risk, resource constraints, meeting client needs, maintaining a balanced effort on quality control (to avoid excessive costs and burdens), maintaining standards, responding to change in the external environment, being innovative, managing data availability, and considering program design. A number of these issues are examined in the chapters that follow. They were assessed as being present in all three practices but not in the same way. Performance auditors place a particularly strong emphasis on reputational risk as seen in Chapter 7. Evaluators place a stronger emphasis on data availability and program design. An OECD study on national audit offices identified the challenges and limitations to addressing new challenges because of the lack of skilled staff and lack of resources. The independence of external audit offices depends on being properly resourced, as explained by INTOSAI, which noted: Heads recognised the contribution that strong, properly resourced and independent supreme audit institutions play in improving transparency, accountability and value for money to ensure that public funds are appropriately spent. (INTOSAI, 2015, p. 4) For some developing countries such as Botswana this has meant the timely release of funds (Ghana News Headlines, 2018). For audit offices in developed countries it has meant working with legislatures to ensure appropriate levels of funding that would not compromise their independence and the quality of their work, while at the same time, taking steps to do their work as efficiently as possible. This has led to careful consideration of taking on audits that are verifiable or testable (auditable) and doing them as efficiently and effectively as possible. It has also led audit offices to examine how they can provide information to legislatures in a variety of ways (some less expensive than others), such as through testimony, briefings, studies or best practice guides. While auditors and evaluators have been raising issues about the efficiency and effectiveness of government expenditures, the same questions have been asked of their own practices. National audit offices, in addition to their own review processes including external peer review, are subject to legislative scrutiny and government-wide budget reductions. Internal government units undertaking internal audit and program evaluation are directly impacted by government efficiency reviews and budget cutbacks. While budget shortfalls lead to challenges in meeting plans and commitments, they also provide an incentive to make better use of available resources and avoid unnecessary duplication. They can give rise to some of the innovations seen in Chapters 12 and 13. And they can lead audit units to more closely examine the skills required and make efforts to get the right skills mix in place. For all three practices the necessary complement of staff with the necessary skills to carry out the work becomes an important element of managing these

The Practices of Audit and Evaluation 63 functions. As national audit offices take on more performance audits they have drawn on more of the multidisciplinary skills found in evaluation either through contracted staff or by making them permanent members of staff. For internal audit and program evaluation opportunities exist for greater coordination and exchange of needed skilled staff. The Potential for Collaboration Because of their different contexts the shared challenges are often defined differently and dealt with differently. It is these very differences that provides the opportunity for greater collaboration, learning and exchange. For example, auditors and evaluators both exercise their professional judgment throughout the course of an audit or evaluation. They both exercise that judgment to determine how programs work in order to form conclusions and potentially make recommendations. Recent publications in the evaluation literature examine how that process works by making explicit the logic of inquiry that generates study strategies and design. The insights from Pawson and Tilley, for example, on the complexities and uncertainties of program results are equally applicable to auditors as evaluators (Pawson & Tilley, 2004). For such collaboration and sharing to take place there needs to be a greater understanding and appreciation of what sets the different activities apart and the specific contexts in which they operate. Given the similarity in tradition and roots, this is easier for performance audit and internal audit, particularly for professionals with accounting designations. There is an easier movement of professionals between them, often dependent on interest and personal preference of working within a government organization or externally, and for doing more work on compliance, safeguarding assets and detecting fraud rather than program performance. Evaluation has the greatest potential area of collaboration with audit in areas of effectiveness, although it has less in common with the audit work on compliance, safeguarding assets, and detecting fraud. Nevertheless, some time and effort on the part of the evaluator would need to be given to understanding the standards of the other practice and its language which carries unique meanings as illustrated in Chapters 2, 3, and 4. There are many interesting possibilities for collaboration on issues of effectiveness. For internal audit within some organizations, proximity to evaluation provides for such collaboration on particular issues of common interest. As performance audit deals with more effectiveness issues, more data analytics in non-traditional financial areas, and more undertakings that require new data collection and analysis, evaluation expertise becomes more important and holds out opportunities for collaboration. A skilled evaluator would not have all the skills and experience of an accountant who has specialized in performance audit and vice versa, but as performance audit is developed it requires more evaluation skills and experience to complement existing capability. A number of national audit offices have taken

64 M. Barrados initiatives to bring in more of the expertise offered by evaluation through internal training to broaden the skillset for auditors beyond traditional accounting, or by contracting out for evaluation expertise. Others have brought evaluators into the office and made them part of performance audit teams. Opportunities exist for greater synergy between the practices in government. Equally opportunities exist for professional development and exchange particularly between the evaluator and the performance auditor.

Conclusion Auditors and evaluators are important players in the accountability and oversight regimes in democratic governments. The distinctiveness of the three practices discussed in the first part of this book is evident along with a number of dimensions that have been examined – their traditions and roots, the professionalization of their practices, their organization and accountability relationships, and various types of products. Yet within these areas of distinctive practice each share principles of practice, examine similar areas of work, have common elements of practice, and face similar challenges. The institutional setting within which they operate is key to defining how their work is done. A significant distinction is whether audit work, as in the case of performance audit, is located in an external, independent office such as a national audit office or is internal to government as is the case of internal audit and evaluation. For individual practitioners these contexts are important and defining. Auditors and evaluators are faced with answering the questions put to them, whether a strategic question from internal plans, a question arising from legislatures or managers or boards, in the most efficient and effective way possible. Care is taken in the formalization of the question to be clear on what is being examined. It is in developing the plan to address the question or objective of the study that the methodology and the required expertise is identified. Here, and in the conduct of the audit or evaluation, multidisciplinary input is very valuable and is often provided by evaluators. The evaluator working as part of an audit would be expected to be knowledgeable of the audit standards and have their work comply. National audit offices will not want to compromise on standards and jeopardize professional standing. Such standards are generally a barrier to the multidisciplinary practitioner, although in an organization like the GAO in the United States or the Office of the Auditor General (the Canadian National Audit Office) training and support is provided. Procedural processes that develop from standards tend to be more elaborate than what evaluators are familiar with but will need to be followed. Many evaluators (including some of the authors in this volume), have had good careers in national audit offices in performance audit. Particularly for offices that have larger performance audit practices there are opportunities for career development and advancement for professionals with training in other than accounting.

The Practices of Audit and Evaluation 65 As seen in later chapters in this volume there are also opportunities for working together between internal audit and evaluation particularly when they are within the same organizations. This proximity tends to lead to a more collaborative approach among both sets of practitioners. The two groups can work together in many different ways, including through joint planning of parallel studies or examining different aspects of the same are or doing a study jointly. Such approaches require more collaboration and a greater understanding of how each discipline works. A better understanding of the practices provides the necessary insight to allow for greater reliance on each other’s work and that of experts including potential collaboration on the part of practitioners, managers and users of the different products.

Note 1 Factors that identify the practices of performance audit, internal audit, and evaluation were inventoried, grouped and scored by experts in these fields that included the authors of Chapters 2, 3, and 4 and other experts. An iterative method was used whereby significant changes to the inventory were re-circulated to develop a consensus on factors, groupings, and scoring. A scoring was used for each practice areas to indicate presence and weighted by a degree of similarity of the factors – Established Practice, Generally Similar (4) Present with Significant Variations (2), Sometimes Present (1), Generally Not Done (0), resulting in a score, a maximum (number of factors times 4) and a minimum (number of factors times 0) shown on the horizontal axis of the figures. Areas of leading practice were also identified. The result was a consensus based on the judgment of the group of international experts on how the three practices could be compared and contrasted.

References Chelimsky, E. (1996). Auditing and Evaluation: Whither the Relationship? In Carl Wisler (Ed.), Evaluation and Auditing: Prospects for Convergence (pp. 61–67). San Francisco, CA: Jossey-Bass Publishers. European Court of Auditors (2017). Performance Audit Manual (Directorate of Audit Quality Control). Retrieved from www.eca.europa.eu/Lists/ECADocuments/PERF_AUDIT_ MANUAL/PERF_AUDIT_MANUAL_EN.pdf. Fraser, S. (2005). The Role of the Office of the Auditor General and the Concept of Independence. The Canadian Journal of Program Evaluation, 21: 1–10. Ghana News Headlines (2018). Funds for Auditor General to be Released on time – Bawumia. Retrieved from www.ghheadlines.com, 26 February 2018. International Organization of Supreme Audit Institutions (INTOSAI) (2007). Professional Standards Committee International Standards of Supreme Audit Institutions. ISSAI 10, Mexico Declaration on SAI Independence. Retrieved from www.issai.org. International Organization of Supreme Audit Institutions (INTOSAI) (2010). Working Group on Program Evaluation. Program Evaluation for SAIs A Primer. Retrieved from Eurosai.org. International Organization of Supreme Audit Institutions (INTOSAI) (2015). Making SAI Independence a Reality – Some Lessons from Around the Commonwealth. Retrieved from www.intosai.org/fileadmin/downloads/downloads/4_documents/Commonwealth_ Making_SAI_independence_a_reality.pdf.

66 M. Barrados Joint Inspection Unit (JIU) (2016) Oversight Lacunae in the United Nations System. Retrieved from unjiu.org. Morra Imas, L.G., & Rist, R.C. (2009). The Road to Results Designing and Conducting Effective Development Evaluations. Washington, DC: The World Bank. OECD (2016). Supreme Audit Institutions and Good Governance: Oversight, Insight and Foresight. OECD Public Governance Reviews. Paris: OECD Publishing. Retrieved from http://dx.doi.org/10.1787/9789264263871-en. Pawson, R., & Tilley, N. (2004) Realistic Evaluation. Retrieved from www.communitymatters. com.au/RE_chapter. Ruffner, M., & Sevilla, J. (2004). Public Sector Modernisation: Modernising Accountability and Control. OECD Journal on Budgeting, 4: 123–141. Saks, M. (2012). Defining a Profession: The Role of Knowledge and Expertise. Professions and Professionalism, [S.l.], v. 2, n. 1, jun. 2012. ISSN 1893–1049.

Appendix A General Factors Distinguishing Performance Audit, Internal Audit, and Evaluation Practice Traditions Accounting Applied Social Science Compliance-Oriented Oversight Assurance Provider Organizational Setting/Structure Accountability and Reporting Accountability Relationship Reporting to: Parliament Government Public Specialist Group Clients or Board Professionalization Regulated Standards Policies – External Policies – Internal Certification Peer Review Code of Conduct Practice Principles Qualified Practitioners Confidentiality Credibility

The Practices of Audit and Evaluation 67 Reliance on Others Ethical Independent Objective Rigor Quality Control Professional Judgment Engagement of Stakeholders Serves the Public Interest Utility Provider of Information Audits Evaluations Advisory Reports Consulting Areas of Work Safeguard Assets Compliance Detecting Fraud Management Control Economy Efficiency Effectiveness Impact Relevance Effective government Effective Regulations Good Practice Data quality Performance Measurement Practice Elements Report Objectives Agreed Criteria Collect Evidence File Reviews Interviews Rely on Data Collect New Data Conclude Recommendations Describe Policy and Risk Best Practice

68 M. Barrados Cross-government Performance Data Challenges Reputational Risk Resource Constraint Meeting Client Needs Quality – Maintaining the Balance Challenges (Cont’d) Maintaining Standards Responding to Change Being Innovative Data Availability Program Design

Part II

Addressing Challenges in Practice

6 Ethics in Audits and Evaluation In Search of Virtuous Practices Lisa Birch, Steve Jacob, and Alex Miller-Pelletier

Introduction Fraud, corruption and professional misconduct scandals have rocked public trust and confidence in public institutions. The failure to prevent or uncover these scandals raises questions about the role and effectiveness of auditors and of watchdog institutions (House of Commons, 1994). The social and political context in which auditors and evaluators work – discussed in earlier chapters – increases their exposure to a multitude of potential ethical dilemmas and tends to undermine their independence in subtle and not so subtle ways whenever there is heightened tension between the domain of politics and the practices of auditing and evaluating (Neu, Everett, & Rahaman, 2013). This tension is exacerbated by the rise of new political governance whereby governments expect civil servants to substitute “partisan loyalty [to the governing party’s agenda] for impartial loyalty [to the public interest]” (Aucoin, 2012, p. 189). Tensions between impartiality and loyalty complicate the tasks of auditors and evaluators and increase the risks of facing ethical dilemmas or ignoring them. This chapter sheds light on ethics in evaluation, internal and performance auditing by describing the sources of ethical dilemmas, addressing the issues of conflicts of loyalty and of interests, surveying the main philosophical approaches to ethics, and tracing their practical implications for ethical conduct, especially for encouraging virtue and preventing unethical behavior by auditors and evaluators. For the purposes of our discussion, we chose the following definition of ethics: “a system or code of conduct based on universal moral duties and obligations which indicate how one should behave; it deals with the ability to distinguish good from evil, right from wrong, and propriety from impropriety” (Josephson, 1989, p. 2 in Goss, 1996, p. 575). In this chapter, we do not focus on irregular or unethical behavior in general. Rather, we highlight the important distinction between “being good” and “doing good” and the best practices in the auditing and evaluation professions to foster virtuous practices.

The Sources of Ethical Dilemmas in Practice Discrepancies between personal values regarding equity, fairness, and the common good, and those observed during the audit or evaluation may result in

72 L. Birch et al. ethical dilemmas. Ethical dilemmas are situations where an agent can and must choose between two or more courses of action that involve conflicting values, principles, interests, and loyalties, and, thus, are mutually exclusive options, with their own set of advantages, disadvantages, and consequences. Evaluators, internal auditors or performance auditors who behave ethically will be able to recognize ethical dilemmas, engage in explicit reasoning to identify and analyze ethically sound options through a decision-making process that culminates with an ethical choice. They will be able to identify potential and genuine threats to their independence and to adopt safeguard practices to counter them. Performance auditors, internal auditors, and program evaluators risk encountering ethical dilemmas because the fulfillment of their mandates can place them in delicate situations with pressure from various, often competing, interests. They may find themselves stuck between the interests of the client (or audited body), those of the managers and employees whose work is under their scrutiny, their own career and private interests, or those of their immediate superiors, various political interests, and, most importantly, the public interest. Over the course of their professional careers, various situations requiring ethical judgments will confront such professionals. Ethical dilemmas arise in auditors’ or evaluators’ everyday practice due to “what” they look at, or result from “how” they do their work in a given political and organizational context. Many ethical issues are related to authority and power relationships. On the one hand, auditors and evaluators have power over auditees and evaluation stakeholders, for example, through their ability to examine and report on actions undertaken. This asymmetric relation is amplified with vulnerable populations. Evaluation standards encourage evaluators to pay attention to stakeholders and to “devote attention to the full range of individuals and groups invested in the program and affected by its evaluation” (Yarbrough, Shulha, Hopson, & Caruthers, 2011). These stakeholders may seek to influence findings in the pursuit of their own interests regarding power relations within the organization under audit or evaluation. On the other hand, authorities or clients might pressure evaluators or auditors to include (or exclude) stakeholders in the pursuit of their own interests to make (or not) certain recommendations, and to disclose (or not) privileged information that may expose improprieties or serious wrongdoing (Cianci & Bierstaker, 2009a, 2009b; Eliadis, Furubo, & Jacob, 2011, Newman & Brown, 1996). Although auditors have legally defined duties and protections of different kinds for their independence, they nonetheless work in a context where they face political pressures. This is especially the case for senior auditors who interact more frequently with high-ranking bureaucrats with political and administrative roles. Evaluators are likely to encounter even more pressures in their work since – as we discussed in Chapter 4 – they have variable codes and limited legal protections of their independence. Furthermore, in their evaluation practice, civil servants participating in evaluations may not feel free to speak about how a policy or program really works given government expectations about proactive

Ethics in Audits and Evaluation 73 support for its agenda. This may reduce the completeness of the evidence for the audit or evaluation. One of the main ethical challenges of evaluators crops up whenever they are asked to share draft reports, respond to stakeholder or client comments, and then to change the content of their report. Evaluators can experience “pressure to misrepresent” findings and to alter recommendations. This pressure comes the most often from those who commission evaluations (Pleger, Sager, Morris, Meyer, & Stockmann, 2017). Auditors who determine their own work programs are also subject to such pressure since they too need to share drafts and respond to comments from auditees, which may lead them to act in a similar way. Aside from ethical dilemmas regarding their own professional practice, auditors, and evaluators may uncover unethical practices in the organization under their scrutiny, which may heighten pressures from clients and stakeholders regarding findings and recommendations. Conflicts of loyalty and conflicts of interest, two legal concepts, lie at the heart of most ethical dilemmas faced by auditors and evaluators. Conflicts of loyalty arise when professionals have a duty of loyalty to different entities, whereas a conflict of interest occurs when a particular conflict of loyalty also involves a situation where personal interests may influence or appear to influence professional actions and judgment. Auditors and evaluators, like other public-sector employees, have legal and moral duties of loyalty. These duties diverge by country and by historical period regarding to whom or to what one owes allegiance. Written ethical codes and unwritten expectations in the different public sectors may refer to “loyalty to government, loyalty to constitution, loyalty to laws, loyalty to citizens, loyalty to country, loyalty to ethical code, respect for government, respect for citizens/state, hierarchic subordination, obedience, patriotism/nationalism” (Rothstein & Sorak, 2017, p. 10). Whenever such conflicts arise, one must prioritize between one’s duty of loyalty to the employer, to the client or to an agent, one’s loyalty to uphold the highest professional standards, and one’s duty to serve the public interest. The duty of loyalty to the public-sector employer is not, however, absolute (Canada, Treasury Board Secretariat, 2011; Rothstein & Sorak, 2017). In Canada, for example, the courts have determined that the duty of loyalty is constrained by loyalty to the public interest regarding issues of legitimate public concern and by the public servant’s right to freedom of expression. For practitioners faced with an ethical dilemma involving a loyalty conflict, it is always wise to “let the facts speak for themselves” when reporting, notably by establishing them chronologically in a matter-of-fact manner and by avoiding adjectives and adverbs to keep the focus on the unethical behavior that is denounced by the disclosure. Auditors and evaluators confronted by such dilemmas should engage in dialog within the organization and with their respective professional associations, and seek sound legal advice to determine the best strategies for coping with such dilemmas.

74 L. Birch et al. In concluding this section, we note that several factors increase the likelihood that auditors and evaluators will be confronted by ethical dilemmas: 1 2 3 4 5 6

their discretionary decision-making powers (e.g., selecting topics, choosing when to audit, deciding when and how to proceed); their privileged access to confidential information and politically sensitive information; the potential impact of their final reports and recommendations on managers, employees, and beneficiaries; their duties of loyalty to their public or private sector employer or client; their stewardship duties to serve the public interest with integrity, impartiality, legality, fairness, and transparency; their own career and personal interests.

By recognizing the potential ethical dilemmas in these areas that may surface during a specific audit or evaluation before starting the process, auditors and evaluators can anticipate and thus minimize the consequences of such dilemmas. They can also take precautionary or safeguard measures to minimize the effects of “pressures of self-interest, short-term thinking, bottom-line orientation, market practices or unwritten organizational laws which run counter to morality” (Geva, 2006, p. 137).

Resolving Ethical Dilemmas: Insights from Philosophical Perspectives and Research Key ideas in the philosophy of ethics provide useful insights to assist auditors and evaluators in understanding and resolving ethical dilemmas. As shown in Table 6.1, there are three main ethical perspectives that may guide the auditor’s decision-making process. Table 6.1 Three Ethical Perspectives Comparison Point

What one should do to ensure moral correctness

Deontological or Duty-based Ethics

Consequential or Outcome-focused Ethics

Apply the rules of Reflect on the the professional consequences code in an and actions that equitable and fair enhance the manner greater good Criteria for “Doing the right “Doing things assessing the thing” by that yield good moral correctness rigorously results” a decision following the rules

Virtue or Character-based Ethics Develop one’s “good” character, moral and intellectual virtues “Doing right by being of virtuous character”

Source: See Fieser & Dowden, 2011; Alexander & Moore, 2016 for more details.

Ethics in Audits and Evaluation 75 The first two focus on the stance from which ethical reasoning takes place, whereas the third one relies on the character of auditors and evaluators. It assumes that if these professionals embrace virtues such as benevolence, prudence, justice, fortitude, and temperance (Plato’s four cardinal virtues) and cultivate their theoretical and practical wisdom, they will naturally make morally correct professional decisions. This, however, begs the question about decision criteria. In deontological ethics, following rules yields morally correct decisions whereas in consequential ethics, rule-based decisions are not morally correct unless they lead to positive consequences. In other words, for consequential ethics, moral correctness depends on good outcomes, which may require flexibility in the application of rules. While a strong, moral, and intellectual character facilitates ethical reasoning, it does not elucidate the decision criteria in specific cases. There is a continuum of solutions to ethical dilemmas over the course of which the emphasis on deontological and consequential ethical perspectives varies as shown in Table 6.2 (based on Geva, 2006). As noted in Chapter 2, Table 6.2 A Continuum of Ethical Dilemmas and Perspectives for Solutions Point of Comparison

“No-ProblemProblem”

“Compliance Problem”

“Moral Laxity”

Nature of conflict

No conflict between ethical requirements

Valid ethical Valid ethical Two or more valid requirements requirements ethical conflict with are postponed requirements are self-interest, due to low in conflict organisational obligations practices etc. and high discretionary power Low Low High

Motivation to High act ethically Examples: Internalizing Honest reporting Ensuring a ethics in versus falsified healthy work organizational financial environment, culture statements; or free from independence harassment versus and bullying accepting “gifts” Motivation to High Low, unless the Low, unless a solve the profession is crisis arises or issue highly the profession regulated is highly regulated Consequences Minor Variable Variable Ethical Mainly Mix of Mix of Perspective Deontological Deontological Consequential Ethics and and Consequential Deontological Ethics Ethics

“Genuine Moral Dilemma”

Loyalty to an organization versus public interest (whistleblowing)

High

Major Mainly Consequential Ethics

76 L. Birch et al. auditors are expected to adhere to their professional code of conduct. Non- compliance for performance auditors in national audit offices or CPAs can result in dismissal or removal of their professional designation. Evaluators have codes of conduct that are not as uniform and not usually related to sanctions. Research reveals that ethical choices are complex, multi-dimensional ones that individuals make in light of their own level of cognitive moral development, some key character traits, the nature of the dilemma and the organizational context in which they work. A meta-analysis of 30 years of research on the subject provides insights about achieving ethical behavior and reveals that such choices are rarely just a case of a “bad apple” (individual character), a “bad case” (difficult issues). or a “bad barrel” (organizational culture) (Armenakis & Lang, 2014; Kish-Gehpart, Harrison, & Treviño, 2010). The individual traits associated with “being a good apple” and “doing the right thing” are valuing idealism (not moral relativism), demonstrating high (not low) levels of cognitive moral thinking, and showing deep concern for collective (as opposed to self-interest and well-being). When ethical dilemmas have a high moral intensity and serious consequences, it is easier for auditors and evaluators to engage in ethically sound reasoning. Finally, “bad barrel” problems are avoided when organizations promote a healthy organizational culture that values benevolence (over egoism), prioritizes principles (over interests), establishes an explicit ethical code and enforces it. Individual and organizational factors combined with the nature of each specific ethical dilemma influence the process of ethical deliberation and decision-making (Craft, 2013). Despite doubts concerning the professional integrity of auditors in the wake of major scandals, the importance of the “auditors’ character” in the decision- making process is still regarded as essential (Libby & Thorne, 2007). Contrary to popular myths, three socio-demographic variables, namely age, gender, and education, are not related to ethical and unethical decisions. One’s level of cognitive moral development and one’s moral philosophies are more important.

Key Questions and Answers for Ethical Practice What Are the Steps of an Ethical Decision-Making Process? Best practices for addressing an ethical dilemma call for auditors and evaluators to engage in a formal ethical decision-making process and to reason through the problem by assessing issue contingencies (Jones, 1991). This process involves four steps: “1. Acknowledge the situation. 2. Define the conflictual values of the situation. 3. Make an ethical decision based on a rational resolution of the values conflict in the situation. 4. Establish real dialogue with all persons concerned” (translated from French, Legault, 1999, p. 93). These steps emphasize the need for explicit definition of the values conflict and for dialog as well as deliberation. First, the ethical decision-making process begins when the reaction to a situation does not come naturally and when an individual must take some time to think before doing an action. From there, auditors and evaluators facing an ethical

Ethics in Audits and Evaluation 77 dilemma must answer key questions: “What is the nature of the ethical dilemma?,” “What are the possible courses of action?,” “Which action will I do?,” and “Why will I choose this action rather than another one?” (translated from French and adapted Legault, 1999, p. 75). The first two questions require ethical reasoning about the dilemma and possible solutions. The third one addresses the issue of whether the moral judgment is determinate: what is the required ethical action? The final question refers to cognitive awareness of the motivation to solve the moral issue. If the motivation is high, the answer might refer to the desire to respect a specific code of ethics and the values it promotes or to protect public interests. The theory of moral intensity helps to analyze the dilemma in question and subsequently to reason through the chosen course of action. There are six components that allow us to measure the moral intensity of a moral issue: “magnitude of consequences, social consensus, probability of effect, temporal immediacy, proximity, and concentration of effect” (Jones, 1991, p. 372). When auditors and evaluators are confronted with ethical issues, they can take into account such criteria in order to deepen their decision-making process by reflecting explicitly on both deontological and consequential criteria. How Can an Auditor or an Evaluator Know When Decisions Involve a Moral Issue? One of the problems of the ethical decision-making process is that an auditor or evaluator may not be sufficiently trained to be able to detect that a decision involves a moral issue. This is problematic as the first step should be the recognition of the ethical dilemma in which the auditor or evaluator is involved (Desautels & Jacob, 2012; Jones, 1991; Legault, 1999). Concretely, this lack of moral recognition can lead to important consequences, and Jones explains: “a person who fails to recognize a moral issue will fail to employ moral decisionmaking schemata and will make the decision according to other schemata, economic rationality, for example” (1991, p. 380). Another potential obstacle to moral recognition is that the ethical structure of organizations presupposes that concerned individuals will know that they are facing an ethical dilemma (Geva, 2006). Indeed, organizations often expect that a professional will consult the code of ethics or call on the help of a colleague to address ethical issues (Jacob & Boisvert, 2010), which assumes that professionals correctly recognize the presence of an ethical issue. However, this is not always the case (Jones, 1991). This is one reason why professional associations and academic institutions added moral recognition training and ethics courses to the curriculum offered to auditors and evaluators. Should an Auditor or Evaluator Refer to Individual, Professional or Organizational Values? Identifying the values to which agents can refer can help them to better understand the ethical reasoning process and to understand what underlying values

78 L. Birch et al. will make them prefer one decision over another one. Audits and evaluations are not only different in their format, but also in what they analyze. According to Davis (1990), auditors work mostly on “management control,” including “organizational structure, plans, procedures, and monitoring,” whereas evaluators concentrate on “the substance of government interventions and environmental influences rather than management control issues” (p. 37). Because auditors and evaluators pay attention to different elements, Davis suggests that each is promoting distinctive values. For instance, given that “control” is more present in audits than in evaluations, auditors tend to use “more detailed procedures” than evaluators do (p. 38) and may view ethics in terms of compliance. Evaluators, by virtue of their preoccupations with intended, unintended, and perverse effects of policies, may have a broader perspective on ethics that includes aspired standards. From a broader perspective than the values specific to audits and to evaluations, auditors and evaluators sometimes have to choose between referring to core individual, professional, or organizational values. Obviously, some values can be part of more than one group, but certain values are more often associated with one of these three groups (individual, professional, and organizational). Individual values such as honesty or tolerance are internalized throughout one’s private life from an early age whereas professional ones such as independence or integrity are acquired by becoming an auditor or an evaluator and belonging to a relevant professional association with its codes of professional conduct or ethics and professional training. Organizational values are those found in the public or private workplace, which may be stated formally in policies and codes or which may be implied by actual behavior. Some core values in auditing and evaluation practice are “(1) accountability, (2) efficiency and effectiveness, (3) openness and transparency, (4) participation, and (5) rule of law” (Menzel, 2015, p. 360). Independence and impartiality are also important core values. Are Codes of Ethics Useful in the Ethical Decision-Making Process? The establishment of codes of ethics aims to “positively impact” the organization where they are implemented by enhancing the quality of audit judgments (Pflugrath, Martinov-Bennie, & Chen, 2007, p. 568). Since auditors and evaluators can be seen as working for the public, these publicly available codes of ethics are essential. It allows employees who are in ethical dilemmas to refer to ideas, principles, and guidelines sanctioned by their profession, and it allows the public to understand what ideas are part of the culture of these organizations. They are documents to which all parties can refer in the context of a conflict. In auditing and evaluation, codes of ethics are a very well-known tool for implementing certain values in a governmental (or private) entity and for reassuring the public about the profession. They serve as an official statement of the mission and values of the entity, but also have a symbolical role. Codes shifted from rules to principles-based approaches with prescriptive components in

Ethics in Audits and Evaluation 79 response to ethical crises of corporate and public governance (Leibowitz & Reinstein, 2013). These shifts raise the bar for auditors’ ethical behavior from simply following codes and obeying rules to requiring auditors to think and act ethically (Arjoon, 2006). The main differences between national and international codes are of form, approach, and emphasis rather than substance. In the twenty-first century, the trend is toward the internationalization of professional ethical codes with global standards underscoring the importance of independence (Allen, 2010; Sadowski & Thomas, 2012). For example, the International Ethics Standards Board for Accountants Code of Ethics for Professional Accountants (IESBA, 2009) prescribes specific safeguard behavior in response to “threats to independence” and provides examples of “insurmountable threats.” Independence rules are especially important for auditors. Indeed, formal ethics codes are documents to which auditors, actors of the public administration, clients, and the public can refer in order to gain a better understanding of the work of performance auditors. For instance, INTOSAI developed a specific guide, which presents the Fundamental Principles of Performance Auditing (2013). This code was established to ensure the “credibility, quality and professionalism of public-sector auditing” (INTOSAI, 2013, p. 4). It is worth noting, however, that in addition to professional codes, public-sector evaluators and auditors are expected to conform to the general ethical expectations which underpin the conduct of public administration. Public administrations tend to have aspiration-based codes that focus on service delivery and the public, or compliance-oriented ones with a focus on loyalty to government and rules or ones with both aspirational and compliance components (Rothstein & Sorak, 2017). The principles and values in these codes overlap with professional codes in some areas and are different from them in other areas, notably regarding professional independence and loyalty to the profession’s ethical standards, as opposed to loyalty to the government. However, ethics codes are necessary, but insufficient to ensure virtuous practices. Criticism toward codes of ethics includes that they may be “too abstract, coercive, and unworkable while producing red tape and restricting practical options” (OECD, 1996 in Downe, Cowell, & Morgan, 2016, p. 900). When ethics codes combine aspirational statements with practical, operational guidelines for addressing ethical dilemmas and are applied in a favorable organizational culture, they do support the development of ethical competencies (Meine & Dunn, 2013). The American Evaluation Association’s “Guiding Principles Review and Report” (2013) revealed that 56% of their members “read and used the Guiding Principles in their evaluation practice,” and when evaluators were aware of the Guiding Principles and read them, there was more chance that they would use them again than not. Although ethics codes contribute to virtuous practices by setting explicit standards, they work best to promote good conduct in combination with practices such as recruiting employees with high ethical standards, establishing a strong ethical culture within organizations, and providing positive, proactive ethical leadership.

80 L. Birch et al. Such leadership entails managers leading by example through intervening before a delicate situation escalates to the formal level, and engaging in deliberation when applying ethics codes to resolve the ambiguity of real-world ethical dilemmas (Downe, Cowell, & Morgan, 2016; Menzel, 2015). In other words, ethical leaders must “walk the talk,” model virtuous behavior, and act with fairness and integrity to improve follower outcomes (Greenbaum, Bardes Mawritz, & Piccolo, 2015; Bedi, Alpaslan, & Green, 2016). If employers only hire auditors and evaluators with virtuous characteristics such as altruism (not egocentrism) and internalized professional values (Libby & Thorne, 2007), they will be more likely to have virtuous practices in their organizations through employees who are more responsive to ethical leadership and more supportive of an ethical organizational culture. Thus, the necessary and sufficient conditions for virtuous practices include (1) ethics codes with aspirational and operational guidelines, (2) employees and managers with strong ethical standards, and (3) an organizational culture that values ethical behavior. Empirical research on ethical decision-making among auditors confirms the importance of these factors in the process of ethical decision-making (Jones, Massey, & Thorne, 2003). It is therefore no surprise that the guidelines for ethics audits of public organizations seeking to improve ethical governance recommend conducting compliance audits (for codes and laws), cultural audits (for employee perceptions), and system audits (for compliance, culture, and ethics system performance) (EUROSAI, 2017). Furthermore, by virtue of their work and their own adherence to high ethical standards, auditors promote ethical practices in public organizations (Elmore, 2013).

Creating the Predispositions for Ethical Behavior (Prevention) Ethical Climate and Organizational Culture Preventing ethical crises and preserving public confidence requires a strong ethical organizational culture and leadership where there is consistency between policies and actions in all areas (Arjoon, 2006; Armenakis, Brown, & Mehta, 2011; Armenakis & Lang, 2014; Douglas et al., 2001; Treviño et al., 1998). A strong, ethical organizational culture ensures governance by combining rulesbased approaches that focus on legal and regulatory standards, ethics codes, with industry standards and principles-based approaches that rely on subsidiarity or self-regulation, solidarity with dialog, and covenantal or contracted relationships for decentralized decisions and trust building (Arjoon, 2006). The ethical climate of an organization can have a strong impact on how professionals work. An organization with a strong ethical climate will lead to greater consequences than simply “being good.” Indeed, it appears that “values such as efficiency, effectiveness, quality, excellence, and teamwork … are reinforced by a strong rather than a weak ethical climate” (Menzel, 1993, p. 201). In order to illustrate what ethical organizational culture means, Wittmer (2005) begins his chapter with the following joke: “An applicant is being interviewed

Ethics in Audits and Evaluation 81 for a position in an organization. The human resources manager asks, ‘Do you lie, steal, or cheat?’ The applicant replies, ‘No, but I am willing to learn.’ ” (p. 49). This simple joke seeks to show how the behavior of an individual is malleable – in a positive or negative way. It also shows how including ethics screening questions with a simulation of an ethical decision-making process to address an ethical dilemma in job interviews for all positions, including management ones, could provide useful insights about candidates regarding their propensity to make ethical choices and their capacity to reason through ethical dilemmas. Since individual factors do influence ethical reasoning and behavior, it is important to address ethical recruitment strategies. The selection of candidates with higher moral reasoning skills in combination with ethical leadership can have a powerful and positive impact on the ethical environment within organizations and, thus, the ethical behavior of managers, employees, evaluators, and auditors. Ethical Leadership Different authors emphasized the importance of influential hierarchical supervisors on professionals when assessing the ethical culture of public organizations (Menzel, 2015). Leaders are “the primary influencers of ethical conduct in organizations” (Henle, Giacalone, & Jurkiewicz, 2005, p. 98). While reflecting on the best organizational practices to ensure ethics, the role of positive leaders stands out as essential. Moreover, not only leaders, but also other individuals who are part of the organization can influence the ethical behavior of performance auditors. Indeed, colleagues can influence the ethics of the organization not only by acting ethically themselves, but also “through peer reporting of unethical behavior” (Treviño, Weaver, & Reynolds, 2006, p. 969). A recent study reveals that ethical leaders “increase[s] subordinate willingness to report ethical problems,” “can increase organizational commitment,” and “can reduce absenteeism” (Hassan, Wright, & Yukl, 2014, p. 340). Policy Guidelines, Rewards, and Sanctions (Prevention) Finally, a necessary, but insufficient condition for promoting ethical behavior and controlling unethical behavior is the use of policy guidelines and ethics codes with rewards and sanctions that are communicated clearly to all managers and employees. This idea of sanctions emphasizes “negative incentives” or “penalties” based on rules and regulations to promote conformity to ethical standards unlike positive incentives through rewards for ethical behavior, formal ethics training, and leadership by example, for instance (Menzel, 2015). However, recourse to policies and codes seem to work well when they are upheld by leaders who “walk the talk.” When they do not, subordinates become cynical and critical of organizational hypocrisy regarding ethical standards.

82 L. Birch et al. Also, the enforcement of policies and codes via sanctions is criticized as it may require “ethics officials to spend an inordinate amount of time, and enforcement authorities to squander precious resources, often to the detriment of other significant ethics considerations” (Denhardt & Gilman, 2005, p. 267). This is besides the fact that such policies are not always easy to apply and enforce given the fact that ethical dilemmas, by their nature, are in a gray zone.

Conclusion The search for virtuous practices in auditing and evaluation reveals that there are three necessary and sufficient conditions in organizations for the promotion of ethical behavior and decision-making whenever conflicts of loyalty and interests arise. These three conditions are: (1) individual factors (predispositions toward ethical reasoning, altruism over egocentrism); (2) leadership factors (positive role models); and (3) organizational factors (strong ethical culture with clear ethics codes and policies with rewards for ethical behavior and sanctions for unethical behavior). The following points summarize the main considerations to promote virtuous practices in evaluation and auditing: 1

2

3

4

Among various ethical dilemmas, auditors and evaluators are particularly susceptible to facing conflicts of loyalty linked to pressures from competing interests and political pressures, especially in contexts where new political governance is present. These conflicts happen when auditors and evaluators wonder whether they should be loyal to their employer, to their client, to the government, to the standards of their profession, or to the public interest. Ideally, such conflicts should be resolved by placing the interests of the profession and the public before all others, as there is a hierarchy of loyalty. In such cases, by “letting the facts speak for themselves,” auditors and evaluators have better protection from retaliation by disgruntled agents. We suggest that there is a continuum of potential responses to every ethical dilemma. In some cases, the answers can be found by applying deontological and, in others, consequential ethics or a mix of both approaches. The different solutions auditors or evaluators can choose from when faced with an ethical dilemma depend on, among other things, if they know what the ethical action to undertake is and if they have the motivation to act ethically. When facing an ethical dilemma in the exercise of their profession, evaluators and auditors must answer key questions explicitly: “What is the nature of the ethical dilemma I am facing?,” “What are the possible courses of action?.” “Which action will I take?,” and “Why will I choose this action rather than another one?” (translated from French, Legault, 1999, p. 75). This requires reflection and deliberation, often with the help of others. When resolving ethical dilemmas through a deliberation process, auditors and evaluators should articulate explicitly their understanding of the ethical dilemma, the ethical reasoning about each possible course of action and its

Ethics in Audits and Evaluation 83 consequences as well as the rationale for the chosen action. This includes identifying the sets of individual, professional or organizational values. 5 Ethical leadership and the establishment of a strong ethical organizational culture are fundamental to promoting ethical behavior. A strong ethical organizational culture includes rules-based and principles-based approaches, values loyalty to high professional standards, establishes ethics codes and policies that adhere to legal, regulatory and ethical standards, and provides opportunities for proactive ethics training as well as dialog and deliberation whenever ethical dilemmas arise in practice. 6 Ethics are fundamental to evaluators and auditors because it is evidence that they do their job well. Their ethical behavior also models the ethical behavior of others within their own organization and of those that fall under their scrutiny. Evaluators and auditors have to be trained to be able to recognize ethical dilemmas, to analyze the solutions to the dilemmas, and to act accordingly. Despite the awareness and progress toward virtuous practices by international organizations, legislators, public and private administrations, academics, professionals, and their respective professional orders and associations, especially since the 1990s, embedded barriers in workplaces continue to undermine measures to model ideal individual and organizational behaviors. Yet, overcoming them to achieve higher levels of ethical accountability is essential to restore public confidence. Auditors and evaluators encounter unethical behavior in organizations through their work and they may face ethical dilemmas in the exercise of their profession. For these reasons, they may need a broader vision of their role that redefines their contribution to social betterment. Auditors and evaluators do not work only for their client, but also for the public’s interest. In the hierarchy of loyalties, then loyalty to protecting and preserving the public’s interest along with adherence to professional ethics codes should prevail over loyalty to clients. This will secure the integrity and credibility of the profession.

References Allen, C. (2010). Comparing the ethics codes: AICPA and IFAC. Journal of Accountancy, 24–32. American Evaluation Association (2013). Guiding Principles Review and Report. Retrieved from www.eval.org/p/cm/ld/fid=51. Arjoon, S. (2006). Striking a Balance Between Rules and Principles-Based Approaches For Effective Governance: A Risks-Based Approach. Journal of Business Ethics, 68(1): 53–82. Armenakis, A., Brown, S., & Mehta, A. (2011). Organizational culture: assessment and transformation. Journal of Change Management, 11(3): 305–328. Armenakis, A., & Lang, I. (2014). Forensic Diagnosis and Transformation of an Organizational Culture. Journal of Change Management, 14(2): 149–170. Aucoin, P. (2012). “New Political Governance” in Westminster systems: Impartial public administration and management performance at risk. Governance: An International Journal of Policy, Administration and Institutions, 25(2): 177–199.

84 L. Birch et al. Bedi, A., Alpaslan, C., & Green, S. (2016). A meta-analytic review of ethical leadership outcomes and moderators. Journal of Business Ethics, 139(3): 517–536. Canada, Treasury Board Secretariat (2011). Duty of Loyalty. Retrieved from www.canada. ca/en/treasury-board-secretariat/services/values-ethics/code/duty-loyalty.html. Cianci, A.M., & Bierstaker, J.L. (2009a). The effect of Performance Feedback and Client Importance on Auditors’ Self – and Public – Focused Ethical Judgments. Managerial Auditing Journal, 24(5), 255–274. Cianci, A.M., & Bierstaker, J.L. (2009b). The Impact of Positive and Negative Mood on the Hypothesis Generation And Ethical Judgments of Auditors. Auditing: A Journal of Practice & Theory, 28(2): 119–144. Craft, J.L. (2013). A Review of the Empirical Ethical Decision-Making Literature: 2004–2011. Journal of Business Ethics, 117(2): 221–259. Davis, D.F. (1990). Do You Want a Performance Audit or a Program Evaluation? Public Administration Review, 50(1): 35–41. Desautels, G., & Jacob, S. (2012). The Ethical Sensitivity of Evaluators: A Qualitative Study Using a Vignette. Evaluation. The International Journal of Theory, Research and Practice, 18(4): 438–451. Denhardt, K.G., & Gilman, S.C. (2005). In Search of Virtue: Why Ethics Policies Spawn Unintended Consequences. In G. Frederickson & R. Ghere (Eds.), Ethics in Public Management (pp. 259–273). Armonk: M.E. Sharpe. Douglas, P.C., Davidson, R.A., & Schwartz, B.N. (2001). The Effect of Organizational Culture and Ethical Orientation on Accountants’ Ethical Judgments. Journal of Business Ethics, 34(2): 101–121. Downe, J., Cowell, R., & Morgan, K. (2016). What Determines Ethical Behavior in Public Organizations: Is It Rules and/or Leadership? Public Administration Review, 76(6): 898–909. Eliadis, P., Furubo, J.-E., & Jacob, S. (Eds.) (2011). Evaluation: Seeking Truth or Power? New Brunswick, NJ: Transaction Publishers. Elmore, T.P. (2013). The role of Internal Auditors in Creating an Ethical Culture. Journal of Government Financial Management, 62(2): 48–53. EUROSAI (2017). Audit of Ethics in Public Sector Organisations: Guidelines. EUROSAI: Task Force on Audit & Ethics. Geva, A. (2006). A Typology of Moral Problems in Business: A Framework for Ethical Management. Journal of Business Ethics, 69: 133–147. Goss, R.P. (1996). A Distinct Public Administration Ethics? Journal of Public Administration Research and Theory: J-PART, 6(4): 573–597. Greenbaum, R.L., Bardes Mawritz, M., & Piccolo, R.F. (2015). When Leaders Fail to “Walk the Talk”: Supervisor Undermining and Perceptions of Leader Hypocrisy. Journal of Management, 41(3): 929–956. Hassan, S., Wright, B.E., & Yukl, G. (2014). Does Ethical Leadership Matter In Government? Effects On Organizational Commitment, Absenteeism, And Willingness To Report Ethical Problems. Public Administration Review, 74(3): 333–343. Henle, C.A., Giacalone, R.A., & Jurkiewicz, C.L. (2005). The Role of Ethical Ideology in Workplace Deviance. Journal of Business Ethics: 56, 219–230. House of Commons (1994). The Proper Conduct of Public Business. Committee of Public Accounts – Eighth Report 1993–1994. International Ethics Standard Board for Accountants (IESBA). (2009). Code of Ethics for Professional Accountants. New York, NY: International Federation of Accountants.

Ethics in Audits and Evaluation 85 International Organization of Supreme Audit Institutions (INTOSAI) (2013). Fundamental Principles of Public Sector Auditing. Vienna: INTOSAI. Jacob, S., & Boisvert, Y. (2010). To Be or Not To Be a Profession: Pros, Cons and Challenges for Evaluation. Evaluation. The International Journal of Theory, Research and Practice, 16(4): 349–369. Jones, T.M. (1991). Ethical Decision Making By Individuals in Organizations: An IssueContingent Model. The Academy of Management Review, 16(2): 366–395. Jones, J.C., Massey, D.W., & Thorne, L. (2003). Auditors’ Ethical Reasoning: Insights From Past Research and Implications for the Future. Journal of Accounting Literature, 22: 45–103. Kish-Gehpart, J., Harrison, D., & Treviño, L.K. (2010). Bad Apples, Bad Cases, and Bad Barrels: Meta-Analytic Evidence About Sources of Unethical Decisions at Work. Journal of Applied Psychology, 95(1): 1–31. Legault, G.A. (1999). Professionalisme et deliberation éthique: Manuel d’aide à la décision responsable. Québec: Presses de l’Université de Québec. Leibowitz, M.A., & Reinstein, A. (2013). Ethics Mindsets, New and Old. The Journal of Contemporary Business Issues, 18(1): 11–21. Libby, T., & Thorne. L. (2007). The Development of a Measure of Auditors’ Virtue. Journal of Business Ethics, 71(1): 89–99. Meine, M.F., & Dunn T.P. (2013). The Search for Ethical Competency: Do Ethics Codes Matter? Public Integrity, 15(2): 149–166. Menzel, D. (2015). Research on Ethics and Integrity in Public Administration: Moving Forward, Looking Back. Public Integrity, 17(4): 342–370. Menzel, D.C. (1993). Ethics Induced Stress in the Local Government Workplace. Public Personnel Management, 22(4): 523–536. Newman, D.L., & Brown, R.D. (1996). Applied Ethics for Program Evaluation. Thousand Oaks, CA: Sage Publications. Neu, D., Everett, J., & Rahaman, A.S. (2013). Internal Auditing and Corruption Within Government: The Case of the Canadian Sponsorship Program. Contemporary Accounting Research, 30(3): 1223–1250. Pleger, L., Sager, F., Morris, M., Meyer, W., & Stockmann, R. (2017). Are Some Countries More Prone to Pressure Evaluators Than Others? Comparing Findings from the United States, United Kingdom, Germany, and Switzerland. American Journal of Evaluation, 38(3): 315–328. Pflugrath, G., Martinov-Bennie, N., & Chen, L. (2007). The Impact Of Codes Of Ethics and Experience on Auditor Judgments. Managerial Auditing Journal, 22(6): 566–589. Rothstein, B., & Sorak, N. (2017). Ethical Codes for the Public Administration: A comparative study. Working Paper series 2017:12, QOG The Quality of Government Institute, Department of Political Science, University of Gothenburg. Retrieved from https://qog.pol. gu.se/digitalAssets/1663/1663513_2017_12_rothstein_sorak.pdf. Sadowski, S.T., & Thomas, J.R. (2012). Toward a Convergence of Global Ethics Standards: A Model From the Professional Field of Accountancy. International Journal of Business and Social Science, 3(9): 14–20. Treviño, L.K., Butterfield, K.D., & McCabe, D.L. (1998). The Ethical Context in Organizations: Influences on Employee Attitudes and Behaviors. Business Ethics Quarterly, 8(3): 447–476. Treviño, L.K., Weaver, G.R., & Reynolds, S.J. (2006). Behavioral Ethics in Organizations: A Review. Journal of Management, 32(6): 951–990.

86 L. Birch et al. Wittmer, D.W. (2005). Developing a Behavioral Model for Ethical Decision Making in Organizations: Conceptual and Empirical Research. In G. Frederickson & R. Ghere (Eds.), Ethics in Public Management (pp. 49–69). Armonk: M.E. Sharpe. Yarbrough, D.B., Shulha, L.M., Hopson, R.K., & Caruthers, F.A. (2011). The Program Evaluation Standards: A Guide for Evaluators and Evaluation Users (Joint Committee on Standards for Educational Evaluation) (3rd ed.). Thousand Oaks, CA: Sage.

7 Managing Reputational Risk Richard Boyle and Peter Wilkins

Introduction This chapter concentrates on reputational risk and its management. The focus within this theme is on performance audit and the overall integrity of Supreme Audit Institutions (SAIs). While reputational risk is important for individual evaluators and internal audit units, threats to the reputation of an SAI have system-wide implications for national governance. The statutory independence of SAIs and perceptions surrounding this give them a valuable foundation for a high reputation but many factors can harm this reputation including how they go about implementing their mandates, and the nature of the mandate given to them by legislation. Harm to reputations undermines trust and confidence, which can affect relationships, reliance placed on information provided, and willingness to co-operate. Performance audits, which are often less clear-cut and definitive than traditional audits, have moved SAIs into more contestable space with consequent implications of an increased risk of challenges to reputation. Knowing about the potential for such reputational risks, and how to manage them, is an important competency for those working in performance audit. Busuioc and Lodge (2016, p. 252), citing Maor (2015), state that “reputation, the result of the receptiveness by an audience to one’s appearance, is the source of organizational power.” Reputation is influenced by both perception and performance, with each audience/stakeholder having their own view of an organization’s reputation. Managing reputational risk, as part of wider risk management activities, has been an increasing feature for both public and private sector organizations in recent years. For most public-sector organizations, risk management is now a central feature of corporate governance and accountability systems. McPhee (2005, p. 11) however feels that reputational risk has not been effectively managed with potentially significant consequences for public organizations: “Reputation damage is possibly the most misunderstood and ill-managed of an organization’s risk management activities and no amount of crisis management can usually repair the damage.” Power (2018) has documented the rapid growth of risk management since the 1990s. Power found that

88 R. Boyle and P. Wilkins organizational control systems were being transformed into risk management systems, and that these were being made into more public and auditable objects. As a result, organizations were being turned ‘inside out’ and becoming more vulnerable to reputational damage in external environments. This development was not a reflection of an actual increase in the risks being faced by organizations, but rather the rise of the “risk management of everything” and of accountability as a vehicle for assigning blame. However, it is also the case that reputations are more fragile than ever. For example, social media is accelerating the speed with which experiences and expressions of dissatisfaction are shared, and collective action initiated. Accountability institutions, such as SAIs and internal audit teams, have not been immune from these trends, and the need to consider reputational risk issues. As Busuioc and Lodge (2016, p. 257) note: For “institutions of accountability” … accountability is a core task. Their organizational reputation is exclusively built on their competence in discharging their (distinct) accountability roles. The manner in which they discharge their accountability responsibilities is thus crucial to their reputation-building efforts. Moreover, if found slacking, the reputational costs would be very high. As mentioned in the introduction, the main audiences of SAIs are national parliaments, often a specific committee such as the Public Accounts Committee in the Westminster systems, and indirectly the media, interest groups and the citizenry. SAIs are an important, independent source of information to legislatures and citizens on what governments are doing with public money. As such, the reputation of SAIs is vital to their standing as a trusted source of information, and a vital part of the accountability system for the control and oversight of public expenditure. The conduct of performance audits has implications for the reputation and reputational risk management of SAIs. Performance auditing, as Lonsdale, Ling, and Wilkins (2011) show in an overview of developments in performance audit, has meant SAIs assessing a wide range of government activity, often complex in nature, with the need for a broadening of methods and approaches. Both the methodological and political risks and challenges associated with such work have drawn SAIs into increasingly contested spaces where other stakeholders may challenge their independence and rigor.

An Analytic Framework Table 7.1 presents an analytic framework for examining reputational risk and its management. Drawn from the literature and our own experience of performance audit, we identify a number of contributing factors for examination in an analysis of reputational risk: • •

Audit strategy – the areas of focus of performance audit. Topics – selection, scoping including proximity to the political domain.

Managing Reputational Risk 89 Table 7.1 Analytical Framework Contributing Factor

Case Study

Audit Strategy Topic

Mix of audit strategies across four Nordic countries Mandate of an Australian Auditor-General challenged by a municipal association Irish Comptroller & Auditor general report on Project Eagle Responses to performance audit reports - survey results GAO autonomy (independence) and relevance tension UK NAO report on Universal Credit

Quality Effectiveness Communication Integrity

•

Quality – including methodologies, judgments, quality assurance, and compliance. Effectiveness – responses to their work: positive, negative, ignored. Communication – from “gotcha” through to coaching and encouraging. Integrity – of the individuals involved, skills, and organizational culture.

• • •

For each contributing factor, we briefly examine a case study illustrating a reputational risk issue or issues arising from the case. As might be expected, often the cases affect more than one contributing factor. Our principal aim, though, is to use the case to illustrate issues associated with the contributing factor linked to it. We also identify a number of audiences whose perception of reputation is important when it comes to performance audits, notably parliaments, public-sector agencies and staff, media, community interest groups, and peer audit offices. Audit Strategy Case Study: The Strategic Options of SAIs Regarding Performance Audit Jeppesen et al. (2017) identified and investigated four strategic options for Supreme Audit Institutions (SAIs) through a case study of four Nordic national audit offices: •

•

A performance audit-based strategy which amplifies the differences between performance auditing and financial auditing. This strategy can be risky for the reputation of the SAI because the focus on effectiveness brings performance auditing relatively closer to evaluation, creating competition with evaluative agencies and exposing the SAI to political critique of the methods and the conclusions drawn. A financial audit-based strategy often combined with compliance auditing. This option is usually chosen to raise the status of public-sector auditing by drawing on the general legitimacy of the private auditing profession, in particular its independence from the auditee. However, the reputational risk here is that this strategy can lead the SAI to be considered more of a critic than a mentor.

90 R. Boyle and P. Wilkins •

•

A portfolio strategy in which the SAI divides into different subunits specializing respectively in performance auditing, financial auditing, compliance auditing and other types of work. In these cases, the reputational risk is that the professional identity of the SAI inevitably becomes rather ambiguous or segmented, often with little co-operation or even conflict between the different types of auditors. A hybrid strategy in which state auditing is being developed as a hybrid practice merging elements from performance auditing with either financial auditing or evaluation to form a new type of audit.

The study found that the Swedish, Norwegian, and Finnish SAIs have all chosen a portfolio strategy, performing a combination of financial auditing, performance auditing, and compliance auditing. The Danish NAO is an exception, integrating financial, compliance, and performance auditing into two audit products called “annual audits” and “major studies.” To support the conduct of these types of audits, the Danish NAO has adopted a matrix organization, encouraging co-operation between performance auditors and financial auditors on the auditing of a particular organization. The other Nordic SAIs seem to treat performance auditing and financial auditing as entirely different types of work, which is also evident from their organizational structure and their professional identity. Another difference in approach is that the Danish NAO appears to have a relatively high degree of focus on performance accountability audits, with performance audit more closely aligned to financial audit. The Swedish and the Finnish NAO focus more on performance improvement audits and therefore concentrate more on the use of social science research methods in their work.

Discussion The strategy or strategies pursued by SAIs in determining the number and type of performance audits to be carried out can create reputational risks. Too few performance audits and they may not generate credibility in the field. Too many, and it may be at the expense of financial audits and compliance, and by being overly drawn into contentious policy arenas. If SAIs orient their performance audits more toward compliance with standards (as applies in financial audit), it is possible that their reputation for rigor will be enhanced, but potentially at the expense of useful insights. If oriented more toward standards applied in evaluation, performance audits are likely to be seen as targeting improvements in practices, but more open to critique as to the rigor of the methodologies applied. Many SAIs consequently adopt a portfolio strategy, developing different teams with expertise in financial audit, compliance, and performance audit. But the tensions referenced above remain and it is a question of finding an appropriate balance in any strategy developed. The hybrid strategy applied by the Danish NAO provides an example of audit teams mixing performance audit with financial audit, and offers an alternative approach of managing the dilemmas.

Managing Reputational Risk 91 This presents its own challenges. From the perspective of managing reputational risk, the issue is that strategies for performance audit are developed with an awareness of their potential reputational implications, and thought given as to how to manage them. Topic Selection Case Study: Challenge to Mandate of the Victorian Auditor-General to Conduct a Performance Audit The Auditor-General of Victoria tabled a report in parliament in 2015 that examined the effectiveness of support for local government focused on the activities of the state-based local government agency and the Municipal Association of Victoria (MAV). Many of the findings were unexceptional. However, the Auditor-General observed MAV’s conduct during this audit has been disappointing. It has been marked by repeated challenges to my mandate, the scope of the audit, its inability to provide evidence in a timely fashion, and sometimes its refusal to provide certain information. (Victorian Auditor-General, 2015, p. vii) He claimed that the audit of MAV was clearly within his mandate whereas MAV claimed that the audit was outside his jurisdiction as an independent membership organization. He expressed disappointment “that MAV has not clearly accepted my recommendations or outlined how it will address them” and advised that he will be monitoring developments closely to ensure that his recommendations are addressed. He called for a comprehensive review of MAV’s role and legislation (Victorian AuditorGeneral, 2015, p. ix). He also observed that he found that the independent oversight and scrutiny applied to MAV is well below the level that most statutory public bodies face. There is limited scrutiny of MAV’s Rules, limited independent review of its performance beyond its annual reporting, and no clear understanding of whether MAV is delivering value for money. Indeed, my audit provides a rare independent and transparent assessment of MAV’s performance. (Victorian Auditor-General, 2015, p. vii) The MAV in its response to a draft of the report did not challenge the Auditor- General’s mandate over MAV but robustly criticized many of the findings (Victorian Auditor-General, 2015, p. 78). By way of a rebuttal, the Auditor-General commented that “MAV’s response to this audit’s recommendations provide neither clear nor consistent responses as to whether it accepts the recommendations and the timing of any specific actions that it will take to address the issues identified” (Victorian Auditor-General, 2015, p. 86).

92 R. Boyle and P. Wilkins Discussion It is evident that while the topic was a reasonable choice and was likely to be of wide interest to stakeholders, the reputation of the Auditor-General was at risk given the apparent challenge to his mandate to include MAV within its scope. The report generated media coverage and comment in parliament, including by one member who noted a specific concern emerging in the sector “is the reluctance of the MAV, including its president and board, to accept the recommendations and certainly the authority of the Auditor-General first of all to undertake the audit, and second in relation to the recommendations” (Peulich, 2015). The government initiated a review of the legislation governing the MAV and the related consultation paper included a comment from the minister that [a] recent Auditor-General’s review of the effectiveness of support for Local government also recommended that the Government review the MAV’s functions, roles, responsibilities, powers and obligations in the context of its existing legal framework. We want to reflect this thinking in a contemporary, accurate MA Act. (Environment, Land, Water, & Planning (Victoria) 2017) A follow-up report by the Auditor-General in 2017 found that “[b]oth MAV and LGV have responded to our recommendations and have taken appropriate action to address the underlying issues identified in our 2015 audit” (Victorian Auditor- General, 2017, p. 14). It would appear from these developments that the disputed mandate of the Auditor-General revealed by his strident comments in his 2015 report had the impact of settling the issue in favor of his view and that his reputation was not damaged by his decision to include the MAV within the scope of the audit. While the case study focuses on a single performance audit, review of a sequence of performance audits may lead to perceptions about the approach of the Auditor General involved, for instance perceptions that the program either avoided or pursued with undue vigor politically sensitive topics. Perceptions may also relate to whether the underlying emphasis is on catching agencies out and generating controversial media coverage or working closely and quietly with agencies to improve performance. These perceptions can be the basis of a reputation that is viewed either positively or negatively depending on the vantage point of the people involved. Quality Case Study: Irish Comptroller and Auditor General Report on Project Eagle In 2016, the Comptroller & Auditor General (C&AG) published a special report on the sale by the National Asset Management Agency (NAMA) of its Northern Ireland portfolio (Office of the Comptroller & Auditor General, 2016). The report

Managing Reputational Risk 93 was critical of the manner in which NAMA disposed of 800 properties to US firm Cerberus for €1.6 billion in a deal known as Project Eagle, and concluded that the sale involved a significant probable loss of value to the state of up to £190 million. In a strong public response to the C&AG report, NAMA rejected its main conclusions, and criticized the quality of the methodology used and expertise of the auditors (NAMA, 2016). In a public release on their website, NAMA described the C&AG report as fundamentally flawed and argued that the findings were based on incorrect assumptions on the discount rate for property in Northern Ireland. They argued that staff who had no market experience and no expertise in loan sales produced the findings in the report. The strength and public nature of NAMA’s response to the C&AG report was unprecedented for a state body in Ireland. Their chief executive articulated NAMA’s main case in evidence to the Public Accounts Committee. This was that the key finding in the C&AG’s report, that NAMA made a “probable loss” on this sale, is derived from a mistaken assumption that a 5.5% discount rate, not the market discount of at least 10%, should have applied to derive the market value of the portfolio. In a strong defense of the report, the C&AG outlined to the Public Accounts Committee the quality assurance procedures put in place for the production of the report and defended the expertise of those involved (Committee of Public Accounts, 2016): For this report, we applied more than the usual testing and challenge because of NAMA’s strong objections to the findings. We arranged, on a collegial basis, with our sister organisation, the UK National Audit Office, NAO, that two senior managers from its financial markets unit would review and challenge the draft report. Both had market experience before their employment with the NAO. As it happened, they were also just at that time finalising a report on the UK Government’s sale of former Northern Rock financial assets. In April 2016, they challenged my team on the findings and provisional conclusions of the draft report and provided useful information and suggestions which we took on board. In May 2016, I asked for a further and deeper challenge process, which was undertaken by a former secretary and director of audit of my office. He was involved in setting up and overseeing the audit of NAMA until 2012 and has also served as a member of the audit board of the European Investment Bank. We asked him to examine all the evidence we were using regarding Project Eagle and the written responses from NAMA and to consider if the conclusions were appropriate, given the evidence. His advice and suggestions were also taken on board in further refining the report. These processes were a process of assurance for me. Having considered the evidence presented and the C&AG’s report, the Public Accounts Committee concluded that the C&AG’s report was evidence-based, balanced and reasonable. They found that The C&AG’s view of a probable loss of up to STG £190 million was based on his examination of NAMA’s own figures as documented and the C&AG stated he was not making a commercial evaluation in relation to that figure or the decision to sell its Northern Ireland portfolio in one lot.

94 R. Boyle and P. Wilkins Discussion While it is not unknown for auditees to be critical of performance audits, and particular the methods used or expertise of the auditors (Morin & Hazgui, 2016), it is unusual for such criticism to take place in such public a manner, and in the full spotlight of the media, as in the Project Eagle case. The potential reputational risk to the SAI was very significant in this context. The issue of estimating a “probable loss” to the state became a central aspect of the public disagreement, indicating that performance audit can bring SAIs into contested spaces on subject areas where there are no simple right or wrong answers, but estimates based on probability. Maintaining the support of parliament, as expressed by the support of the Public Accounts Committee, was crucial to minimizing the reputational risk. The media did not come down on one side or other of the argument, but tended to present the story as one of a clash of opinions between two important state institutions. The case highlights the importance of performance auditors having strong quality assurance procedures. Where there is particular contention between the auditee and the auditor, assurance arrangements beyond those routinely carried out may be necessary. Effectiveness Case Study: Responses to Performance Audit Reports A study of the Norwegian Office of the Auditor General (Riksrevisjonen), examined the response of civil servants to performance audit reports issued by the office. The aim was to assess the impact of performance audits on practice in the audited organizations (Reichborn-Kjennerud & Vabo, 2017). A survey of 253 civil servants was carried out. Their answers to questions on what changes occurred after the performance audits were used as proxies for actual change. Almost half the respondents noted that the audited body had made changes to a large or very large extent because of the performance audit. The most common changes that respondents saw as improvements related to increasing documentation and reporting, changes in strategies and planning, and changes to internal control and risk management. Sixty percent thought the reports useful to a large or very large extent. The respondents who thought the reports were useful tended to agree with the audit criteria, think that conflicting objectives in the policy sector were taken into account, that the methods used were rigorous and appropriate, and that there was a clear link between audit criteria, facts, and assessments. Less than 10% said there was little or no change, and 16% thought the reports were useful only to a limited or very little extent. Of those who did not make changes, the main reasons given were that they felt the facts in the report were presented inaccurately or because they disagreed with the SAI’s conclusions. Of those considering the reports of little use, the main reasons given were that they failed to be important sources of information, the auditors did not have good sector experience,

Managing Reputational Risk 95 the reports failed to take into account conflicting objectives in the policy sector, and they disagreed with the audit criteria. Both those positively disposed to the reports and those who found them of little use were skeptical of aspects of the reports. Many respondents felt that the reports had oversimplified conclusions, that the auditors paid too much attention to deviance regardless of importance, and that they were too critical.

Discussion If the findings of performance audits are not taken seriously by those audited, this will have a negative effect on reputation, which may be associated with a “downward spiral” of views on the effectiveness of performance audit. Conversely, the high reputation of SAIs may influence how users perceive and respond to the findings of performance audits. This study shows that the effectiveness of performance audit, as exemplified by the use made of such reports, is positively associated with perceptions of report quality. Where the reports are seen as useful and of good quality, they are more likely to be acted on. This requires performance audit reports to show that they have clear, agreed criteria, rigorous methods, and be carried out by staff with relevant skills and knowledge of the area or issue under scrutiny. The implications in terms of managing reputational risk with regard to effectiveness, as viewed by public agencies and their staff, are that performance auditors need, first, to have an understanding of how organizations respond to and use their reports. Surveys of the kind conducted in Norway can be of benefit here. The importance of having robust conclusions based on sound evidence is also highlighted. Conclusions perceived to be weak or over-simplistic have the potential to lead to reputational damage. Communication Case Study: Government Accountability Office Management of Autonomy Versus Relevance In a study of autonomy and relevance, using the US Government Accountability Office (GAO) as a case, Shahan (2014, 2015) examines how the GAO relationship with Congress has evolved as they have become more involved in program evaluation since the 1960s. He also examines the implications for how they communicate with Congress. Up to the mid-1980s, despite maintaining a close relationship with Congress, the work done by GAO on program evaluation was mostly on its own initiative. Autonomy and independence were emphasized. However, there was limited congressional interest in GAO evaluations. Shahan notes that during a “contested independence” phase, from 1985 to 1997, the GAO shifted from producing self-selected reports to being more responsive to requests from members of Congress. In the early 1990s this led to a situation where they gave precedence to requests coming from the chairs of congressional

96 R. Boyle and P. Wilkins committees, who were members of the majority Democratic party. Republicans severely criticized the GAO and its reports. When Republicans came to power in Congress in 1995, they cut the budget of the office by 25%. During a subsequent “political risk management” phase, from 1998 to the present, GAO adopted a number of strategies to minimize this long-term risk to their reputation as a bi-partisan office: • •

•

Continuing the development of internal expertise. Strengthening the methodological foundation of their work and establishing itself as an expert, neutral institution providing credible information. Making itself more relevant. The Comptroller General met with the House Leadership and made it clear that the GAO worked for all of Congress. They increased their responsiveness to congressional requests. Requests are developed and refined through a lengthy consultation process that takes place between the GAO and congressional committee members and their staff. Protecting itself from political manipulation. Protocols for how the office worked with Congress were developed which describe how it deals with congressional requests, how it modifies those requests into workable and unbiased research questions, how it decides on its methodology, and how it publicizes its reports. These influence and shape the dialog that takes place between the office and congressional committee members and their staff. Specific efforts have also been made to include the views of both the majority and the minority. In effect, the protocols help provide equal weight to the requests sent by the majority and minority.

Discussion Parliament or legislatures are normally the main direct audience for performance audit reports produced by SAIs. There can be significant implications for the reputation of SAIs if they are seen to be either overly autonomous from the legislature and therefore potentially irrelevant in their eyes, or overly responsive in an effort to be relevant, but then run the risk of being seen as partisan. Working closely with the legislature to determine the nature of performance audits to be carried out entails a reputational risk, especially if the SAI is perceived to be responsive to the needs of the majority party (or parties) at the expense of other elected officials or parliamentarians. Developing effective communications channels with legislatures is central to managing such reputational risk. Having agreed protocols or guidelines for how to manage the relationship can be helpful in providing guidance as to how to deal with requests: which get precedence, which cannot be responded to and so on. Also important is that requests from elected members for studies are taken as a starting point for dialog between the SAI and elected members. Issues such as the scope of the study, the methodology used, and the research questions to be addressed, should be open to discussion to enable the SAI to frame the study in a way that is both achievable and non-partisan.

Managing Reputational Risk 97 Integrity Case Study: UK NAO Report on Universal Credit In June 2018, the UK National Audit Office (NAO) produced a very critical report on the Universal Credit project, which is a long-running attempt to replace six means-tested benefits (NAO, 2018). The report found that the project is not value for money now, and that its future value for money is unproven. Former Conservative leader and former Secretary of State, Iain Duncan Smith, introduced Universal Credit. He attacked the report as a “shoddy piece of work” in parliament. Subsequently, the Secretary of State for Work and Pensions (the minister), Esther McVey, claimed in parliament that the NAO report, despite the concerns raised, suggested that Universal Credit is working, and that the NAO was pushing for faster implementation. She further suggested that the report did not take into account the most recent information. In response, the Comptroller & Auditor General (C&AG), Amyas Morse, asked to see the Secretary of State, and when a meeting was not forthcoming, he wrote an open letter to the papers pointing out that she had wrongly attributed views about the project to the NAO. He indicated that the report had not recommended faster implementation, but had said that the department must now ensure it is ready before it starts to transfer people over from previous benefits. With regard to it not being based on recent information, he noted that the department had indicated in writing to the NAO just prior to the report’s release that it is based on the most accurate and up-to-date information. He further noted that her statement in response to the report claiming Universal Credit is working has not been proven. The minister was called to parliament to explain her initial statement. She apologized to MPs for “inadvertently” misleading parliament, while not accepting some of what the C&AG said. She was then the subject of an opposition censure debate and an attempt to punish her financially for misleading parliament. The permanent secretary at the Department for Work and Pensions, at a subsequent Public Accounts Committee hearing, stated that he had confidence in the work of the NAO.

Discussion As noted in the previous case study on communications, the parliamentary response to performance audits opens up potential areas of reputational risk. In the case of the Universal Credit report, a risk to the integrity of the report, and by extension, the NAO. A minister chose to misrepresent and criticize the NAO report on one of the government’s prominent projects, then only partially corrected the record, while repeating much of her original statement. As one commentator noted: The NAO is the watchdog charged with helping parliament hold government accountable for public spending. Yet McVey appeared unconcerned

98 R. Boyle and P. Wilkins about its head publicly accusing her of making false claims to parliament: she offered only a semi-apology while going on to repeat aspects of her original statement. (Guerin, 2018) The head of the SAI, after failing to get a meeting with the minister, sent an unprecedented public letter to the media effectively accusing the minister of misrepresenting the NAO findings to MPs. This intervention, which would not have been undertaken lightly, was done to preserve the integrity and independence of the SAI. One implication for the management of reputational risk by SAIs and others producing performance reports is that they may have to take unprecedented steps to preserve the integrity of the organization.

Conclusions Like trust, a good reputation is slow to gain but is easily lost. Managing reputational risk is an issue both for individuals and for organizations. This chapter has focused on performance audit as carried out by SAIs. However, many of the lessons learned have implications for evaluators and internal auditors. They need to pay attention to quality assurance procedures, and use agreed protocols or guidelines to help manage stakeholder relationships, to take but two examples that are of general relevance. One significant lesson emerging from the cases studied is that performance audit is increasingly bringing SAIs into contested spaces. The cases suggest that auditees and others are nowadays more willing to publicly challenge the authority and standing of the evidence produced by SAIs in performance audits. In the Victoria, Irish, and UK examples, critique of the SAI report by those under scrutiny has been vociferous, and has taken place in public, with the attention of the media being prominent. This, in turn, has required SAIs to take steps beyond normal practices to address reputational concerns, such as employing additional external quality assurance procedures and the issuing of an open letter to the media explaining the stance taken. The cases indicate a situation likely to be seen with increasing frequency in a world where unwanted findings are dismissed as “fake news,” findings may be willfully misinterpreted, and experts and the evidence produced by them seen as legitimate targets for criticism. However, in explaining their work and protecting their reputations, SAIs need to tread carefully to avoid introducing new material and in effect taking sides in ongoing political debates. Carpenter and Krause (2012, p. 26) identify four types of reputation: • •

Performative reputation – Can the agency do the job? Can it execute charges on its responsibility in a manner that is interpreted as competent and perhaps efficient? Moral reputation – Is the agency compassionate, flexible, and honest? Does it protect the interests of its clients, constituencies, and members?

Managing Reputational Risk 99 • •

Procedural reputation – Does the agency follow normally accepted rules and norms, however good or bad its decisions? Technical reputation – Does the agency have the capacity and skill required for dealing in complex environments, independent of and separate from its actual performance?

This study suggests that all these aspects of reputation need to be managed by those producing performance audits. The analytic framework developed and set out in Table 7.1 facilitates a consideration of the reputation of audit offices, and the factors they need to consider in managing reputational risk. The factors in the framework can be seen as primarily contributing to the four types of reputation as follows: performative – strategy, topic, and effectiveness; moral – integrity; procedural – communication; and technical – quality. It should be borne in mind that there are other possible explanations of reputation not addressed through the case studies. For example being seen as a barrier to innovation (Hoffman, 2018), and additional roles discussed by Haigh and Wilkins (2017) in relation to trust. The framework should be seen as a guide but not all-embracing. It is also worth noting that undue focus on a high reputation as a primary objective may lead to short-sighted responses rather than addressing underlying issues such as topic selection and quality. Managing reputational risk is often a by-product of good practice rather than a focus of attention in and of itself.

References Busuioc, E.M., & Lodge, M. (2016). The Reputational Basis of Public Accountability. Governance, 29(2): 247–263. Carpenter, D., & Krause, G. (2012). Reputation and Public Administration. Public Administration Review, 72(1): 26–32. Committee of Public Accounts (2016). Committee Debate: Special Report No. 94 of the Comptroller and Auditor General: National Asset Management Agency Sale of Project Eagle. Retrieved from http://oireachtasdebates.oireachtas.ie/Debates%20Authoring/ DebatesWebPack.nsf/committeetakes/ACC2016092900002?opendocument#B00950. Environment, Land, Water and Planning (Victoria) (2017). Municipal Association Act Review: Consultation Paper. Retrieved from www.localgovernment.vic.gov.au/__data/ assets/pdf_file/0019/73324/Municipal-Association-Act-review-consultation-paperMay-2017.pdf. Guerin, B. (2018). The McVey Universal Credit Row Exposes Problems in Ministerial Accountability. Retrieved from www.instituteforgovernment.org.uk/blog/mcvey-universalcredit-row-exposes-problems-ministerial-accountability. Haigh, Y., & Wilkins, P. (2017). Locating Trust Relations in the Australian Policy Process. Paper to the Third International Conference on Public Policy, Singapore. Retrieved from www.ippapublicpolicy.org//file/paper/593b6c789d0fa.pdf. Hoffman, M. (2018). Martin Hoffman: Performance auditing – Friend or Foe to Public Sector innovation. Retrieved from www.themandarin.com.au/90952-martin-hoffmanperformance-auditing-friend-or-foe-to-public-sector-innovation/.

100 R. Boyle and P. Wilkins Jeppesen, K., Carrington, T., Catasus, B., Johnsen, A., Reichborn-Kjennerud, K., & Vakkuri, J. (2017). The Strategic Options of Supreme Audit Institutions: The Case of Four Nordic Countries. Financial Accountability & Management, 33(2): 146–170. Lonsdale, J., Ling, T., & Wilkins, P. (2011). Conclusions: Performance Audit: An Effective Force in Difficult Times?. In J. Lonsdale, P. Wilkins & T. Ling (Eds.), Performance Auditing: Contributing to Accountability in Democratic Government. Cheltenham, UK: Edward Elgar. Maor, M. (2015). Theorizing Bureaucratic Reputation. In A. Wæraas & M. Maor (Eds.), Organizational Reputation in the Public Sector. London: Routledge. McPhee, I. (2005). Risk and Risk Management in the Public Sector. Keynote Address to 2005 Public Sector Governance & Risk Forum. Canberra: Australian National Audit Office. Morin, D. & Hazgui, M. (2016) We are Much More Than Watchdogs: The Dual Identity of Auditors in the UK National Audit Office. Journal of Accounting & Organizational Change, 12(4), pp. 568–589. NAMA (2016). NAMA response to the C&AG Special Report on the sale of Project Eagle. Retrieved from www.nama.ie/fileadmin/user_upload/documents/C_AG_Report/NAMA_-_ response_to_the_CAG_Special_Report_on_Project_Eagle_-_14_Sept_2016.pdf. National Audit Office (NAO) (2018). Rolling out Universal Credit. London: National Audit Office. Office of the Comptroller & Auditor General (2016). Special Report 94 – National Asset Management Agency’s sale of Project Eagle. Dublin: Office of the Comptroller & Auditor General. Peulich, I. (2015). Auditor-General: Effectiveness of Support for Local Government. Retrieved from http://hansard.parliament.vic.gov.au/isysquery/78627467-0686-4c9bb344-446e63f6729e/87/doc/. Power, M. (2018). Reforming Auditing and Risk Management to Improve Governance. Retrieved from www.lse.ac.uk/researchAndExpertise/researchImpact/caseStudies/ power-reforming-auditing-risk-management-governance.aspx. Reichborn-Kjennerud, K., & Vabo, S.I. (2017). Performance Audit as a Contributor to Change and Improvement in Public Administration. Evaluation, 23(1): 6–23. Shahan, A. (2014). The Dilemma of the Supreme Audit Institutions: Autonomy or Relevance? A Case Study of the Government Accountability Office (GAO). Retrieved from http://dx.doi.org/10.2139/ssrn.2573457. Shahan, A. (2015). From Autonomy to Relevance: The Evolution of the Government Accountability Office. Fairfax, VA: George Mason University. Victorian Auditor-General (2015). Effectiveness of Support for Local Government. Retrieved from www.audit.vic.gov.au/sites/default/files/20150226-Support-for-Local-Gov.pdf. Victorian Auditor-General (2017). Follow Up of Selected 2014–15 Performance Audits. Chapter 2 Effectiveness of Support for Local Government. Retrieved from www.audit. vic.gov.au/sites/default/files/20170622-Follow-Up-Audits.pdf.

8 Framing Recommendations Peter Wilkins

Introduction A common feature of the work of performance auditors and their counterparts in internal audit and evaluators is the use of reports to communicate their work. However, in seeking to understand the impact of their work it is important to recognize that while report content is an important factor, other factors contribute to the impact of their work, including the effects of the process used in conducting the audit and the context for the work. Relatively few people read and absorb the content of a full report, and a slightly larger population may have absorbed the content of an executive summary or key components such as a conclusion or recommendations. Others will have absorbed some information via parliamentary debates, management comments, media reports and social communications. To provide practical advice on the opportunities and challenges for those conducting and following-up performance audits this chapter focuses on approaches to preparing performance audit recommendations. Recommendations identify actions that would improve the situation. The conclusion and executive summary of the report provide a concise statement of what was found, and these are considered here to the extent they provide the foundation for generating and explaining the basis of the recommendations made. The chapter reviews previous research regarding audit recommendations. It develops and uses a framework with three domains important for their effectiveness – content, process, and context – and through a case study explores the usefulness of the framework. It argues that the use of such a framework can contribute to the development and follow-up of recommendations. In particular, it can address their alignment with the overall purpose of the project and assist in the consideration of causation. It considers briefly the applicability of these observations to performance and internal auditors and evaluators, and highlights the aspects of process and context that need to be considered by them when formulating recommendations, and by those involved in the adoption and implementation of recommendations.

102 P. Wilkins

The What and Why of Recommendations The International Organisation of Supreme Audit Institutions (INTOSAI) establishes that recommendations are to be provided where it is appropriate. The standard for performance audits states that when relevant the report should include “constructive recommendations that are likely to contribute significantly to addressing the weaknesses or problems identified” (INTOSAI, 2016a, p. 18). In contrast, findings and conclusions are integral components of a report. The same standard also states that the audit findings should “clearly conclude against the audit objective(s) and/or questions, or explain why this was not possible” (INTOSAI, 2016a, p. 18). Standards in operation at the national level differ in varying degrees from this international approach. For instance, the Government Auditing Standards (“Yellow Book”), issued by the United States Government Accountability Office (GAO), retains a similar broadly-scoped approach to both the purpose of recommendations and their link to conclusions, stating that “[a]uditors should provide recommendations for corrective action if findings are significant within the context of the audit objectives” and that they should “make recommendations that flow logically from the findings and conclusions, are directed at resolving the cause of identified deficiencies and findings, and clearly state the actions recommended” (2018, pp. 199–200). The Canadian Audit Office provides guidance based on risks, stating that recommendations “address areas where there are significant risks to the entity if deficiencies remain uncorrected” and notes that “[r]ecommendations guide the actions needed to correct the problems identified in the findings, but it is up to the entity to decide what these actions should be” (Office of the Auditor General of Canada, 2017b, np). The Australian Standard establishes the objective of a performance audit is to obtain reasonable or limited assurance about an activity’s performance against identified criteria and expressing a corresponding assurance conclusion. Recommendations are regarded as subsidiary to the assurance conclusion, it being noted that the report may include “other matters which the assurance practitioner considers meet the information needs of the intended users, such as: … in some cases, recommendations” (Auditing and Assurance Standards Board, 2017, p. 28). The Auditor-General has not adopted this standard in its entirety, and in relation to the report content has instead adopted the provisions of the INTOSAI standard indicating that this “reporting requirement is consistent with the current practice of the ANAO in reporting conclusions, findings and recommendations” (Australian National Audit Office (ANAO), 2018). Somewhat differently again, the UK National Audit Office (NAO) conducts its performance audits under its own guidance which indicates that both conclusions and recommendations are important and interrelated (NAO, nd). Somewhat different emphases in the steps that lead to the formulation of recommendations are evident in the standards used in different countries. In general terms, the explicit or implicit objective of an audit identifies what the

Framing Recommendations 103 audit is designed to find out; the conclusion gives the answer; and along the way, the audit makes judgments based on the findings and criteria that lead to the conclusion. Judging findings against criteria can assist in identifying specific shortcomings that can be corrected and hence the content of recommendations. Conceptually, it is also important to recognize that performance audits are largely based on examining past events and past performance, and yet the recommendations are about what should happen in the future. In this regard, many recommendations have a focus on changes to processes.

Factors Affecting Effectiveness Overall, the effectiveness of recommendations relates to them achieving their purposes. This depends on their contribution to improving performance and decision-making, and enabling accountability. Measures in common use to assess the effectiveness of recommendations include whether recommendations are accepted, and implemented, with less common efforts to identify wider impacts and costs and benefits. Other less direct measures used in research studies include agency perceptions and contribution to debate within the entity, in parliament and in the media. There is little evidence to guide the formulation of recommendations, whereas there is a more extensive body of literature regarding the impact of performance audits overall (see for instance Van Loocke & Put, 2011). A European study of recommendation practices across audit and ombudsman institutions in six EU countries points to the importance of external context (including political agendas and regime changes) and internal factors (including feedback information, accountability, and learning (FAL) processes) in generating change in the public sector. The study included an assessment of 58 reports and found that 77% of 373 recommendations had been implemented or partially implemented, leaving 23% in the category not implemented which meant: “[d]oesn’t agree with diagnosis or solution, requires political decision, lack of resources for implementation, etc.” (Van Acker et al., 2014, p. 103). Another recent study of recommendations by Australian state and territory audit offices and the National Audit Office classified them into 15 categories and identified the three most common functional areas as operational and internal control, corporate governance and accountability, and information quality, services, and control (Parker & Jacobs, 2015, p. 17). Interviews indicated that the number of recommendations may vary over time and as a result of strategic decisions such as aggregating recommendations into a smaller group of general recommendations for easier digestibility. There was also an indication that recommendations might increasingly be designed to be practical and implementable, with a focus on underlying issue causes and desired outcomes, rather than processes (Parker & Jacobs, 2015, p. 20). A study that looked at acceptance of recommendations made by the Canadian Auditor General over the period 2002 to 2013 found that the acceptance of new recommendations increased over this period (rising from 16% to 100%). The

104 P. Wilkins author notes the changes in government over this period but warns that it does not indicate causality. Several people interviewed suggested that a government with a high rate of acceptance may have agreed with recommendations as a form of political “issues management.” According to this perspective, it is easier to simply agree with the OAG’s recommendations, rather than risk the bad publicity of a dissenting opinion that could be published in the OAG’s final report. (Taft, 2016, p. 772) A study focused specifically at the implementation of recommendations analyzed survey responses by European Union Supreme and Regional Audit Institutions and observes that a UK–Nordic group achieves effects through the implementation of recommendations by the auditee while a Germanic group does so through parliamentary and governmental reforms, suggesting that there are different valid approaches to implementing performance audit recommendations (Torres, Yetano, & Pina, 2016, p. 18).

Framework for Recommendations Based on the literature reviewed above, a framework to consider the role and impact of recommendations involves three domains: content, process, and context. A summary of some of the practical advice that has been offered by three organizations regarding the development of recommendations is grouped under these domains in Table 8.1. This advice is primarily based on the views and experiences of practitioners. Table 8.1 Summary of Some Practical Guidance on Recommendations Offered by Three Organizations

Content

NAO (undated)

OAG Canada 2018

unambiguous, avoiding jargon and fully understood by the client

entity-specific, succinct phrased to avoid truisms, but detailed enough to neither too general stand alone, positive nor too detailed, in tone and content clearly state the actions recommended and who is responsible for actions clear on the desired final Written in a way that outcome, resultsallows the auditor to oriented, specific evaluate whether or enough to allow for not they have been monitoring and implemented assessing progress made in implementing them—but not overly prescriptive

Measurable

INTOSAI 2016c

Framing Recommendations 105 NAO (undated)

OAG Canada 2018

INTOSAI 2016c

supported by and flow logically from a balanced assessment of the evidence, constructive and focused on practical improvements

Process

Context

supported by the audit directed at resolving the findings and causes of weaknesses conclusions, aimed at or problems correcting the identified, practical underlying causes of and add value, flow the weakness, logically from the consistent within findings and the audit report, conclusions focused on areas of significant risk, practical clear what should be thought out during the Think about potential achieved and why/ examination phase recommendations Think early about early on in the recommendations that process are likely to have impact work effectively maintain entity relations Where possible, work and constructively to understand the with the audited with the client context and promote entity to identify the throughout your open two-way necessary changes assurance work, communications and ways of get commitment implementing them from the client to act on recommendations likely to lead to sustained have sufficient be possible to and significant knowledge of the implement without improvements and subject matter to additional resource conscious of identify and assess constraints risks and to design and perform appropriate audit procedures

Based on an assessment of the practical advice offered and the literature reviewed, the key components of each domain are presented in Table 8.2 and summarized below. Table 8.2 Components of the Three Domains of Content, Process and Context Content

Process

Context

Clear and concise Flow logically from findings Measurable Relevant and useable

Communication Anticipation

Organisational External pressures

Follow-up

Chance events

106 P. Wilkins Content Clear and Concise INTOSAI indicates that “[i]t should be clear who and what is addressed by each recommendation, who is responsible for taking any initiative and what the recommendations mean – i.e. how they will contribute to better performance.” It also advises that “they should be phrased in such a way that avoids truisms or simply inverting the audit conclusions” (INTOSAI, 2013, pp. 16–17). However, the advice makes clear that the recommendations sit within the context of the entire report, noting that together with the full text of the report recommendations “should convince the reader that they are likely to significantly improve the conduct of government operations and programmes” (INTOSAI, 2013, p. 17). Flow Logically from Findings To have impact conclusions would be expected to flow logically from the evidence as captured in the audit findings, and the recommendations in turn flow logically from the conclusions or findings. INTOSAI provides advice that conclusions and recommendations should “follow logically from the audit findings and the facts and arguments presented” (INTOSAI, 2016c, p. 25). Other advice indicates that an overall opinion “comparable to the opinion on financial statements, on the audited entity’s achievement of economy, efficiency and effectiveness” is not required, but that “auditors should set a clearly-defined audit objective” and that they “should specifically describe how their findings have led to a set of conclusions and – if applicable – a single overall conclusion” (INTOSAI, 2013, p. 5). The findings will inform the appropriate type of recommendation. They might, for instance, seek to change behavior, rules, or structures (Johnston, 1988, p. 78) and involve a consideration of the cost and time to implement, the time lag before taking effect, and the estimated scale of the effect (NAO, 2008b, p. 12). Performance audits are largely based on examining past events and past performance, and yet the recommendations are about what should happen in the future. In this regard, many recommendations have a focus on changes to processes. It would be rare for a report to have no recommendations at all, but in terms of the judgments involved there would be many findings that are not part of the conclusion or addressed by a recommendation. Measurable It would also be expected that to be effective recommendations are formulated in a way that their impact can be measured and/or assessed. Measures used by audit offices and researchers include the implementation of audit recommendations, the acceptance of recommendations, their implementation, auditee’s perception of the auditor (satisfaction, collaboration, auditors’

Framing Recommendations 107 credibility), auditees’ perception of the impact on the audited entity (added value, acceptance of recommendations), and the contribution to public debate (within the entity, in parliament, in the press) (Van Loocke & Put, 2011, p. 197). Relevant and Usable Many factors contribute to the relevance, usability, and usefulness of recommendations, starting with the topic chosen, objectives set, and the framing of the questions to be addressed as lines of inquiry for the review. INTOSAI advises that where relevant “auditors should seek to provide constructive recommendations that are likely to contribute significantly to addressing the weaknesses or problems identified by the audit” and that they should “address the causes of problems and/or weaknesses” (INTOSAI, 2013, p. 16). It also advises that recommendations should be “practical and be addressed to the entities which have responsibility and competence for implementing them” (INTOSAI, 2013, p. 17). The lead reviewer for the strategic review of the Queensland Audit Office identified that sometimes a recommendation will be made but will be rejected, indicating that it was a legitimate thing for a government to do, with both the recommendation and the rejection being very public and that it was the “process of accountability working as it should” (Queensland Finance and Administration Committee (FAC), 2017, p. 8). The illustrative example raised related to a recommendation that service delivery statements should be audited and the reviewer suggested that at least some of the reason for its rejection by government would have been the cost and practicality. Processes Communication It is generally considered that early and continuing communication between auditors and auditees improves mutual understanding but that this understanding increases the prospects that whatever their content the recommendations will be understood and implemented. INTOSAI advises that “[a]uditors should maintain effective and proper communication with the audited entities and relevant stakeholders throughout the audit process and define the content, process and recipients of communication for each audit” (INTOSAI, 2013, p. 8). Among the reasons given for planned communications are the need to obtain insight into the points of view of stakeholders and increase the likelihood that audit recommendations will be implemented. It specifically states that audited entities should have the opportunity to comment on the recommendations before the report is issued. Communication needs to give particular attention to the basis of criteria, this being identified in the European context as a point of tension that can limit the acceptance and implementation of recommendations (Van Acker et al., 2014, p. 108).

108 P. Wilkins It needs to be targeted and varied to meet the context of different groups in the organizations involved. For instance, front-line staff will need different information to senior staff involved in decision-making about the audit and implementation of recommendations. Anticipation The process can usefully encourage the agency to anticipate and address issues identified during the course of the audit. Open communication can facilitate constructive responses. For instance, where a forward work program is announced an agency might initiate an internal review and communicate with the auditor during the course of the review. Actions taken as a result might result in the auditor’s proposed project being re-focused or canceled. Anticipatory action might also arise when the conduct of the performance is announced or during the investigation as the agency becomes aware of emerging findings. While at times it can be frustrating for the auditor to see a planned audit disrupted even once underway, this serves a wider public interest if the efforts to address the problem are genuine, rather than a diversionary tactic. Agencies other than those which are the subject of the audit may also anticipate the findings and recommendations of the final report and implement anticipatory changes. Follow-Up The release of the report and associated media coverage, parliamentary debate and committee hearings may help to consolidate or create a commitment to implement report recommendations. However, how are stakeholders to know and have confidence that this has been the case? The knowledge in advance that there will be a follow-up process and public reporting of the results can further stimulate this commitment. INTOSAI advises that there should be a follow-up process that determines whether actions taken in response to findings and recommendations has resolved underlying problems and weaknesses. The use of the word “underlying” makes this a particularly challenging task, and pointing to the role of findings indicates that the impact of recommendations can be influenced by other content within a report. In undertaking follow-ups, it is therefore important that auditors adopt an unbiased and independent approach. Audited agencies themselves, Parliamentary committees and audit institutions may have their own follow-up processes. Depending on the nature of the processes, there may be a narrow focus on whether the recommendations have been implemented, or a wider coverage to identify if the underlying issues have been addressed to support improved outcomes on a sustained basis, and whether there have been unintended consequences. Broader studies still – possibly called follow-on audits – might revisit the issue in the context of changed circumstances or to identify if actions have been taken by agencies not included in the

Framing Recommendations 109 original audit. Not following up may create a perception that performance audit is a “hit and run” exercise, rather than the auditor demonstrating a commitment to improvement. But care is needed to balance the resource implications of follow-ups relative to addressing new topics. Context Factors other than the content of the report and the processes used may influence the effectiveness of recommendations. These include organizational factors such as the culture of the agency and its previous experience as the subject of audit recommendations, any pressures that may be brought to bear on the agency, the prevailing public-sector appetite for improvement and reform and chance events. Increasing understanding of the significance of these factors is important for ensuring that the recommendations are of value. Organizational Aspects of organizational culture and values can have a significant influence on the impact of recommendations. For instance, the extent to which an organization and key individuals are open to, and have had previous experience of, external reviews, are keen to learn, and already have evidence-based processes, can either support constructive engagement or defensive responses. Pressures on the agency and key staff on other issues may also provide distractions from being deeply involved in the audit process and dealing with its outputs. It has been identified elsewhere that the extent of audit impact is influenced by organizational factors such as the will and interest at the staff level and in the central authority within the organization being audited; political will; whether there is major change in the organization; and possibly the place of the “recommendations within the priority scale of the audited organization’s management” (Morin, 2001, p. 114). External Pressures The interest and support of ministers, parliament, and its committees, the media, and interest groups can all increase the impact of conclusions and recommendations. Such interest creates pressures on agencies to address the substance of the report and go beyond the position of simply accepting the recommendations but potentially doing little more. Chance Events And after all the planning that may be undertaken by auditors and auditees, unexpected events may rapidly change the context of an audit. A sudden change in the make-up of the parliament, the government or the public sector may affect the relevance of the topic, and therefore also to the impact of conclusions and recommendations.

110 P. Wilkins

Australian National Audit Office Case Study From a practitioner perspective, much can be learned by reflecting on the origins and impact of a single conclusion and recommendation pair. To this end, a 2014 report by the ANAO on the Australian Taxation Office (ATO) provides a relevant example (ANAO, 2014). The section of the report that dealt with the conduct of compliance activities included a section on the ATO’s audit and review activities with detailed findings pointing to a lack of information on the return on investment on High Wealth Individuals (HWI) compliance activities. These findings supported an overall conclusion that it was important that the ATO better assesses the costs of HWI compliance activities to more efficiently allocate compliance resources and to more accurately calculate the actual return on investment from these activities. The related recommendation called for two specific actions that would among other things enable the ATO to more efficiently allocate compliance resources and to more accurately demonstrate return on investment. The report indicated that the ATO agreed with this recommendation. A 2017 report by the ANAO focused on the extent to which the ATO had implemented recommendations from ANAO performance audits and parliamentary committee reports. A specific finding was that the recommendation detailed above had not been implemented. The report identified that the ATO recorded that this recommendation had been implemented whereas the ANAO assessed that it had not been implemented as the actions taken by the ATO did not address the intent of the recommendation. As indicated below in relation to the clarity of this recommendation, the intended benefits of the actions recommended were stated explicitly as part of the recommendation. While the ATO had outlined some relevant broad measures aimed at identifying where compliance efforts are best directed, many of the identified activities were already in place prior to the recommendation. Furthermore, the measures had not incorporated the proposed return on investment measure (ANAO, 2017, p. 20). The ATO commented in response that at the time of implementation it believed the actions taken had addressed the intent of the recommendation and that it was not intending to continue with the agreed process due to changes in its business model (ANAO, 2017, p. 20). The 2017 ANAO report did not make a specific conclusion or recommendation about these findings but did identify as an area for improvement: that the ATO selectively considers the impact of implemented recommendations, and sets timeliness targets for implementing recommendations (ANAO, 2017, p. 16)

Case Study Assessment of the Framework The case study outlined earlier in this chapter is used here to illustrate how this framework could be used to assist in the development of recommendations and

Framing Recommendations 111 their follow-up, and provides learnings about the making of recommendations in general. It is based on the ANAO’s written material only, and therefore is not comprehensive and is particularly limited in relation to the internal processes of the ANAO. Clear and Concise At one level the initial recommendation is clear and concise: to enable the ATO to more efficiently allocate compliance resources across the Private Groups and High Wealth Individuals business line, and to more accurately demonstrate return on investment, the ANAO recommends that the ATO: • •

better assesses the cost of compliance activities for the High Wealth Individuals population; and calculates the return on investment for High Wealth Individuals compliance activities on the basis of cash collected, in addition to liabilities raised.

However, it is unclear how the two components inter-relate, and whether including a consideration of cash collected (in addition to the existing practice of using the liabilities raised) is all that is expected to improve the assessment of costs. Flow Logically from Findings The report establishes that the cash collected was not assessed but that it was a significant consideration, such that this aspect of the recommendation flows directly from the findings. The findings are in turn based on the high-level criterion that “compliance activities were supported by effective business and administrative arrangements.” However, in relation to causation it did not establish clearly that better information would lead to better allocation of resources. Furthermore, it did not provide a cost-benefit assessment for the required actions. Measurable The second component of the recommendation, the inclusion of cash collected, is highly specific and completion readily assessed. However, it is unclear how a better assessment of costs could be assessed. Relevant and Usable From the ANAO’s perspective the recommendation met this requirement. However, from the ATO’s comments in 2017 there is an evident reluctance to implement the recommended changes.

112 P. Wilkins Communication A subsequent assessment in a 2017 follow-up report raises doubts about the basis of the ATO agreement with the recommendation in 2014. The 2017 report leaves unanswered questions about why the difference of views occurred. For instance, was it due to inadequate communication by the ANAO of the intent of the recommendation? Or was there an unexpressed lack of will at the time the recommendation was made or that it emerged subsequently? There might also have been cost-benefit considerations that meant that the failure to fully implement the recommendation had a rational basis. Anticipation At least from the time the ATO was advised that the performance audit was to commence it had the opportunity to undertake a self-assessment and anticipate the ANAO’s findings. Its comments in response to the report indicate that it had made a number of significant improvements to its work in this area over the preceding 12 months (p. 27). However, it is not evident if this was in any way prompted by the audit. Follow-Up It is evident from the 2017 report by the ANAO that it followed up the recommendation made in 2014. The disclosure that it had not been implemented and the reasons given by the ATO add to parliament’s ability to hold the ATO to account. Organizational The 2014 report identifies some aspects of the organizational context which assist in understanding the likelihood of the recommendation being implemented and the desired benefit achieved. For instance, one chapter addresses the ATO’s monitoring and reporting of the performance of its HWI compliance strategies. However, it does not identify in any detail how the recommendation identified for the case study would interact with the other recommendation which related to improving a process that supports the identification of those HWI taxpayers at higher risk of non-compliance. External Pressures The involvement of a parliamentary committee in the follow-up of the report indicates that the ATO would have been aware of external pressures to implement the recommendation. It is therefore particularly notable in this case that it did not implement the change as recommended. Chance Events The ATO comment in the 2017 report that there were changes in its business model and therefore that it was not intending to continue with the agreed process

Framing Recommendations 113 potentially reflect the kinds of external events that can alter the relevance of a recommendation. This single case is consistent with the observation that ANAO’s performance audit role is a contested activity (Funnell & Wade, 2012). While in terms of content, the recommendation logically flowed from the criteria and the evidence that was collected, it is evident that issues around the process and context in which conclusions and recommendations are made may also be important. It also reinforces the view that the formulation of recommendations involves a significant element of judgment alongside more than technical considerations.

Relevance for Internal Audit and Evaluation As discussed earlier in this book, performance audit, internal audit, and evaluation share common objectives to help improve performance and to provide information for decision-makers, and they also share many common methods. However, there are significant differences in their approaches to formulating conclusions and recommendations as explored below. Internal Audit The most notable differences relate to the absence of statutory protections for internal audit. It is part of the organization and subject to pressures from various internal sources. Based on interviews with internal auditors in the public sector Roussy (2015) concludes that they tend to lack independence and questions “the appropriateness of considering internal auditing as a meaningful independent assurance device in operating the corporate governance ‘mosaic’ ” (p. 237). Some standards such as the Australian Performance Audit Standard are largely premised on being for external audit, and others, such as the standards issued by the Institute of Internal Auditors, are strictly for internal audit. The IIA sets the work of internal auditors in an assurance context noting that “assurance services involve the internal auditor’s objective assessment of evidence to provide an independent opinion or conclusions regarding an entity, operation, function, process, system, or other subject matters” (Institute of Internal Auditors, 2017a, p. 2). An IIA Practice Guide identifies that work has increasingly included performance audits for non-profit organizations and government agencies (Institute of Internal Auditors, 2017b). However, it also recognizes the importance of recommendations with a focus on governance, stating that they should improve the organization’s governance processes in areas such as making strategic and operational decisions and communicating information within the organization (Institute of Internal Auditors, 2017a, pp. 12–13). In a third category, neither specifically for external or internal audit, the two are seen as essentially the same. For instance, the US Government Audit Standards apply to internal auditors of government or related entities (Government Accountability Office, 2011, p. 6).

114 P. Wilkins Evaluation Evaluation also has queries about its independence as it typically initiated by the agency. In one of the few comparative assessments, an early study of GAO recommendations by Johnston (1988) compared the acceptance rate of its recommendations with the rate experienced by evaluators. It commented that the relatively higher rate achieved by the GAO evaluators at the time could be attributed to its status as “a formal, legislatively mandated, outside evaluator” and because it aggressively marketed its works. In addition, he commented that other evaluators are unlikely to have access to the formidable resources available to the GAO (p. 82) More broadly, the questions have been posed “[i]s it appropriate and useful for evaluators to use findings to make recommendations? If so, under what circumstances?” (Iriti, Bickel, & Nelson, 2005, p. 464). The authors present a decision-making framework for the appropriateness of recommendations in varying contexts, identifying “contextual variables likely to affect the making of recommendations (e.g., use context, evaluator role) and a typology of potential recommendations.” They propose a continuum and nine key variables to consider when deciding whether to provide recommendations: the role of the evaluator; the use context; the evaluation’s design characteristics; the quality, strength, and clarity of evaluation findings; the evaluator’s experience and expertise; ethical considerations; knowledge of costs and trade-offs; the internal capacity of the program; and literature in the field of study (Iriti et al., 2005, p. 472) They also observe that the variables are interdependent. These comments align closely with the comments above regarding performance audit reinforcing the similarities with evaluation. Evaluators working in audit institutions can provide additional insights on interrelationships. For instance, it has been observed that evaluators who adhered to Government Auditing Standards were viewed less favorably by senior staff in legislatures than those working to other standards, with their greater emphasis on independence evident through a reduced willingness to engage with stakeholders being a possible explanation of this effect (Vanlandingham, 2011, p. 95). Research in the evaluation arena also points to considering wider perspectives beyond implementation of recommendations (instrumental use), to include consideration of influencing decision-makers’ thinking about issues over time (enlightenment use), promoting organizational learning by stakeholders (process use), and use for political reasons, including justifying decisions already made (symbolic use) (Vanlandingham, 2011, p. 86). Overall, the legislative basis of performance audit and the strong influence of financial audit-based methodologies distinguishes its approaches to formulating

Framing Recommendations 115 conclusions and recommendations from those of internal audit and evaluation. While it remains unclear if any approach is more effective in its use of conclusions and recommendations, there are many lessons that can be shared between them.

Conclusion All three disciplines (external audit, internal audit, and evaluation) have much to learn from each other. Particularly important is clarity about the purpose of recommendations and ensuring there are rigorous processes for their development. These processes need to encompass the impact sought and whether there are to be both ambitious and “safe” recommendations, and the implications of different numbers of recommendations. It is also important to question in what circumstances it would be more beneficial for a review to make findings but leave it to the agencies to develop responses to these findings, rather than have an outside party specify the changes required. This approach may increase ownership of the actions required to address the issues identified and help to ensure that these actions are best suited to the context at the time of implementation. The framework proposed and trialed with a case study in this chapter can contribute to the development and follow-up of recommendations. In particular, the framework can address their alignment with the overall purpose of the project and assist in the consideration of causation. The importance of causation is highlighted in guidance on the use of theories of change which describe: how and why an initiative or programme is expected to work.… It can help the auditor (and sometimes also those responsible) to obtain a clear description and better understanding of the assumptions on the causal relationship between the output and the intended outcome (objectives) of a policy or programme (INTOSAI, 2016b, p. 12) However, it is important to note that for each recommendation, and for each suite of recommendations, there may be different theories of change. Evidence in support of each theory of change is problematic, as illustrated by efforts to understand the overall impact of performance audits. A review of 14 impact studies identified that the findings are not transferable across audit institutions because of a range of methodological and contextual constraints (Van Loocke & Put, 2011, p. 195). Theories of change can help to overcome the limitations of studies that are primarily looking at past events and yet are seeking to shape future changes. How this affects the focus of such studies is unclear, but it may lead to studies that focus on implementation, rather than design failures and recommendations that address process issues. Ultimately, those creating conclusions and recommendations need to have humility in understanding that they are the product of human endeavor and involve judgment. Similarly, those receiving recommendations should take

116 P. Wilkins responsibility for assessing them and implementing changes that they can justify rather than just doing so because they were made.

References Auditing and Assurance Standards Board (2017). Standard on Assurance Engagements ASAE3500 Performance Engagements. Retrieved from www.auasb.gov.au/admin/file/ content102/c3/ASAE-3500_10-17.pdf. Australian National Audit Office (ANAO) (2014). Managing Compliance of High Wealth Individuals. Retrieved from www.anao.gov.au/sites/g/files/net616/f/AuditReport_ 2013-2014_35.pdf. Australian National Audit Office (ANAO) (2017). Australian Taxation Office’s Implementation of Recommendations. Retrieved from www.anao.gov.au/work/performanceaudit/australian-taxation-offices-implementation-recommendations. Australian National Audit Office (ANAO) (2018). Explanatory Statement: Australian National Audit Office Auditing Standards 2018. Retrieved from www.legislation.gov. au/Details/F2018L00179/Download. Funnell, W., & Wade, M. (2012). Negotiating the Credibility of Performance Auditing. Critical Perspectives on Accounting, 23(6): 434–450. Government Accountability Office (2018). Government Auditing Standards: 2018 Revision. Retrieved from www.gao.gov/assets/700/693136.pdf. International Standards of Supreme Audit Institutions (INTOSAI) (2013). ISSAI 300 Fundamental Principles of Performance Auditing. Retrieved from www.issai.org/ en_us/site-issai/issai-framework/3-fundamental-auditing-priciples.htm. International Standard of Supreme Audit Institutions (INTOSAI) (2016a). ISSAI 3000 Standard for Performance Auditing. Retrieved from www.issai.org/en_us/site-issai/ issai-framework/4-auditing-guidelines.htm. International Standards of Supreme Audit Institutions (INTOSAI) (2016b). ISSAI 3100 Guidelines on Central Concepts for Performance Auditing. Retrieved from www.issai. org/en_us/site-issai/issai-framework/4-auditing-guidelines.htm. International Standards of Supreme Audit Institutions (INTOSAI) (2016c). ISSAI 3200 Guidelines for the Performance Auditing Process. Retrieved from www.issai.org/ en_us/site-issai/issai-framework/4-auditing-guidelines.htm. Iriti, J.E., Bickel W.E., & Nelson, C.A. (2005). Using Recommendations in Evaluation: A Decision-Making Framework for Evaluators. American Journal of Evaluation, 26(4): 464–479. Johnston Jr, W.P. (1988). Increasing Evaluation Use: Some Observations Based on the Results at the U.S. GAO. New Directions for Program Evaluation, 39: 75–84. Morin, D. (2001). Influence of Value for Money Audit on Public Administrations: Looking Beyond Appearances. Financial Accountability and Management, 17(2): 99–117. National Audit Office (NAO) (2008b). Department for Work and Pensions: Progress in Tackling Benefit Fraud. Retrieved from www.nao.org.uk/report/department-for-workand-pensions-progress-in-tackling-benefit-fraud/. National Audit Office (NAO) (nd). What is a Value for Money Study? Retrieved from www.nao.org.uk/about-us/wp-content/uploads/sites/12/2016/10/What-is-a-value-formoney-study.pdf. Office of the Auditor General of Canada (2017a). Direct Engagement Manual: Audit Conclusion. Retrieved from www.oag-bvg.gc.ca/internet/methodology/performanceaudit/manual/7040.shtm.

Framing Recommendations 117 Office of the Auditor General of Canada (2017b). Direct Engagement Manual: Recommendations and entity responses. Retrieved from www.oag-bvg.gc.ca/internet/methodology/ performance-audit/manual/8020.shtm. Parker, L.D., & Jacobs, K. (2015). Public Sector Performance Audit: A Critical Review of Scope And Practice in the Contemporary Australian Context. Retrieved from www. cpaaustralia.com.au/~/media/corporate/allfiles/document/professional-resources/ publicsector/public-sector-performance-audit.pdf?la=en. Queensland Finance and Administration Committee FAC (2017). Public Briefing – Strategic review of the Queensland Audit Office Transcript of Proceedings [Wednesday 19 April 2017] Brisbane. Retrieved from www.parliament.qld.gov.au/documents/committees/ FAC/2017/QAOStrategicReview/trns-pb-19Apr2017.pdf. Roussy, M. (2015). Welcome to the Day-To-Day of Internal Auditors: How Do They Cope With Conflicts? Auditing: A Journal of Practice & Theory, 34(2): 237–264. Taft, J. (2016). From Change to Stability: Investigating Canada’s Office of the Auditor General. Canadian Public Administration, 59(3): 467–485. The Institute of Internal Auditors (2017a). International Standards for the Professional Practice of Internal Auditing (Standards). Retrieved from https://na.theiia.org/standardsguidance/Public%20Documents/IPPF-Standards-2017.pdf. The Institute of Internal Auditors (2017b). Practice Guide: Integrated Auditing Recommended Guidance. Retrieved from https://na.theiia.org/standards-guidance/recommended-guidance/ practice-guides/Pages/Integrated-Auditing-Practice-Guide.aspx. Torres, L.T., Yetano, A., & Pina, V. (2016). Are Performance Audits Useful? A Comparison of EU Practices. Administration & Society, 1–32. Van Acker, W., Bouckaert, G., Frees, W., Nemec, J., Orviska, M., Lawson, C., Matei, A., Savulescu, C., Monthubert, E., Nederhand, J., & Flemig, S. (2014). Mapping and Analyzing the Recommendations of Ombudsmen, Audit Offices and Emerging Accountability Mechanisms. Retrieved from www.lipse.org/upload/publications/LIPSE%20Research% 20Report%20WP3_20150329_FINAL.pdf. Vanlandingham, G.R. (2011). Escaping the Dusty Shelf: Legislative Evaluation Offices’ Efforts to Promote Utilization. Journal of Evaluation, 32(1): 85–97. Van Loocke, E., & V. Put (2011). The Impact of Performance Audits: A Review of the Existing Evidence. In J. Lonsdale, P. Wilkins, & T. Ling (Eds.), Performance Auditing: Contributing to Accountability in Democratic Government (pp. 175–208). Cheltenham, UK: Edward Elgar.

9 Auditing in Changing Times The UK National Audit Office’s Response to a Turbulent Environment Jeremy Lonsdale

State Audit Institutions and Change Chapter 2 identified State (or Supreme) Audit Institutions (SAIs) as one of the main producers of performance audit. Such bodies have no executive powers and do not initiate and implement policy, roles which are instead played by elected governments and their appointed officials. They are established to provide a service to others – primarily legislatures, but also the executive and – in a wider sense – taxpayers and citizens. Their work is designed to inform and influence third parties, to provide assurance to others on how public resources are used, and where necessary, to prompt action in externally driven processes such as financial decision-making, organizational reform and scrutiny and accountability regimes. It is their role to examine and comment on what others do, and respond to what external audiences – for example, parliamentary committees – consider to be useful commentary and analysis. As a result, although SAIs have statutory powers and independent status to determine what work they can do, as well as when and how they do it, they do not operate in a world which they can shape in its entirety. One Canadian observer of audit offices has emphasized the “potential influence of environmental conditions on the work of audit,” and highlighted that “auditors cannot claim any absolute control over all of these conditions, but they should be keenly aware of their existence” (Morin, 2008). From a practitioner perspective, Lonsdale states that despite being independent organizations, SAIs: are not free to determine everything about their working environment. Instead, they need to understand their context, manage those factors where they can, and minimize the impact of those they cannot shape. (Lonsdale, 2013) It has been a commonplace to say that the world of government has become increasingly fast-paced, but in the 2010s it can, for example, be characterized in the United Kingdom by readily available digital information on an unprecedented scale, raised expectations of the quality of service, and responsiveness of

Auditing in Changing Times 119 both public and private organizations at a time of significantly reduced public resources, and greater variety in public service delivery models. As well as a more contentious and fractured political environment, characterized among other things by declining trust in authorities of all kinds, including parliaments and governments. Such uncertainty and changes raise the question: how do SAIs – and in particular, performance auditors within those organizations – respond to a rapidly changing environment while meeting external expectations that they maintain traditional attributes of independence, quality, a robust evidence base, objective reporting, and a non-partisan approach? This chapter briefly examines the changing political and administrative environment in the United Kingdom, and considers how the National Audit Office, the UK’s state audit institution and a major producer of performance audits, has adapted to these changes. It also reflects on the relevance of the messages for evaluators and internal audit. To do this, the chapter examines the NAO’s strategy documents for the period from 2013 to 2017, a period of considerable political and economic change in the United Kingdom, which might be expected to challenge the SAI. The importance of the environment or context in which public organizations operate has been the subject of attention by scholars (Pollitt, 2013, for example, brings together 22 papers on the theme). In this volume, Lonsdale defines the context in which audit offices operate as being the “environment, circumstances, background and settings in which activities – in this case, performance audits – are undertaken” (Lonsdale, 2013). Context helps to shape what an organization can and cannot do, offering opportunities and presenting constraints. The wider environment can also be made up of several, possibly “overlapping worlds,” which either individually, or in some formation, can shape and channel how an organization conducts its business. The strength of an organization relative to its changing environment may determine the extent to which it is able to protect or pursue its interests, and the extent to which it must pay heed to the demands of others (Lonsdale, 2013).

The National Audit Office and its Environment The powers of the National Audit Office (NAO) are provided by statute, the earliest of which dates from 1866, as part of wider financial reforms. More recently, these have been added to by the National Audit Act 1983, which gave the NAO independence from government and provided for value for money audit, and the Local Audit and Accountability Act 2014, under which the NAO reports on the value for money of public spending at local level for the first time. The functions and purpose of the NAO have been largely stable over many years, with longstanding powers changed to reflect the evolving environment of government accountability and organization. For example, the 2001 Sharman Review led to changes to give NAO greater access to local level bodies spending public money (Sharman, 2001, Dewar & Funnel, 2016). In general, however, its essential tasks remained unchanged, while its relationship with parliament and, in particular,

120 J. Lonsdale the Public Accounts Committee have also evolved slowly, rather than been subject to any major upheaval (Committee of Public Accounts, 2007; Dewar & Funnell, 2016). It is clear however that the NAO operates within a series of interlocking wider environments which have been subject to major changes. These worlds are: professional (the NAO undertakes its financial audit work within a framework of regulations and expectations governing the conduct of such activity), parliamentary (the Comptroller & Auditor General (C&AG) is an officer of the House of Commons, and the NAO serves parliament and parliamentary committees), administrative (the NAO audits government bodies which are subject to laws, budgetary constraints and scrutiny arrangements) and political (the NAO’s reports can and are used for the purpose of debate about how public resources are used).

Changing Environment in the United Kingdom It is also clear that each of these environments have been subject to considerable change, reform, and upheaval in recent years. Thus, any analysis of the work of the NAO – and changes within it – can only be understood by taking account of environmental factors. Any consideration is inevitably summarized and selective, but five main themes of relevance to this chapter are highlighted. A starting point for recent changes in the public sector in the United Kingdom is the consequences of the financial crash of 2007–2008, and the responses of successive governments to it. In 2014, the period of “austerity” measures was extended to 2020, and in 2018, the government stated it wanted to eliminate the deficit by the mid-2020s. Between 2010–2011 and 2017–2018, there has been a real-time reduction in local government spending power of 29% and in some government departments there were reductions in staff of around a third between 2010 and 2014 (National Audit Office, 2018a; Institute for Government, 2014). Major reform programs have also been put in place, for example, to the benefits system designed to simplify delivery, but also to secure major savings. A second major factor has been the unusual nature of UK politics, which traditionally has seen single parties win stable majorities under the “first past the post” electoral system. The 2010 election resulted in a coalition between the Conservative and Liberal Democratic parties, the first such arrangement for more than 70 years. This provided a period of stability with the agreement of a fixed-term government of five years. However, following the 2015 election, which the Conservative Party won, the promised referendum on the European Union led to a defeat for Cameron in 2016, who resigned as prime minister and was replaced by Theresa May, who then chose to call an election in 2017, which resulted in a Conservative minority government. Changes in the main opposition party have added to the unusual degree of uncertainty in British politics.

Auditing in Changing Times 121 A third factor has been the UK’s exit from the European Union – “Brexit” – which has added to the complexity because support for the referendum result cut across party lines. The outcome of the June 2016 referendum on whether to remain a member state of the European Union is universally accepted as the most important event of recent years for the United Kingdom. The decision to leave, followed by the resignation of the prime minister, trigged a series of events which at the time of writing are still unfolding, and will continue to do so into the 2020s and beyond. Fourth, there has been continuing reform to the way in which government in the UK delivers public services. There is a considerable literature on developments since the 1970s with privatizations, contracting out, and the implementation of a range of “new public management” ideas (e.g., Pollitt & Bouckaert, 2004). More recently, major government reforms, for example, shared services arrangements and public–private relationships have become central to the way in which the public sector operates. There has also been widespread use of privatesector contractors to deliver public services within many sectors including prisons, health, transport and education, and intermittent concerns about contractor performance. Finally, the period has been characterized by changes in the debate and assumptions about public life and standards. One aspect has been the perceived decline in trust in politicians, arising in part from policy failures and highly publicized events such as the MPs’ expenses scandal of 2009 (King & Crewe, 2013; Hodge, 2016), as well as perception that too many politicians have become too separate from ordinary people (Eatwell & Goodwin, 2018).

What are the Implications for the NAO and its Performance Audit? The commentary above has explained the importance of environmental and contextual factors for the work of SAIs, and outlined the major shifts in the political and economic life of the United Kingdom in recent years. In these circumstances, we would expect to see the NAO responding to the shifts and trends set out above. One way of assessing whether this has been the case is to analyze NAO’s published strategy documents which set out its plans for the coming three years. These are high-level three-year views in which the organization sets out progress against “longstanding objectives” and explains how it proposes to carry out its work within the resources available to it. They are presented to parliament, and scrutinized by the Public Accounts Commission, a committee of parliamentarians which oversees the NAO’s budget and performance. As such they can be expected to be the most authoritative public explanation of what the NAO proposes to do and why. A theme throughout the five strategies is the implications of public-sector resource constraints for the work of the NAO. Each of the Forewords to the strategies highlights its importance, noting that the “public sector continues to

122 J. Lonsdale face financial constraints, placing pressure on its ability to fund and provide high-quality services” (2013, p. 4). The NAO sees the implications as being that reduced resources are driving change in how the public sector is organized and services are delivered, which in turn raises concerns about whether the change is well managed and controlled, and whether government has the skills it needs to make it successful (2013, p. 6). The following year, the NAO commented that “Austerity will continue into the next parliament and there will be profound implications for public sector delivery” (2014, p. 4). It noted that there was a “political consensus that fiscal consolidation will need to continue” (2014, p. 6). Three years later, NAO continued to see public expenditure cutbacks as a challenge, observing that “reduced funding and rising citizen demand puts at risk the volume and quality of local services” (2017, p. 9), as well as undermining routine processes in government due to reductions in skills and capacity. Thus, the NAO has seen the implementation of significant cost reduction initiatives in terms of the risks to delivering value for money and quality of service. Initially, the NAO’s expectations of its role in examining the UK’s decision to leave the European Union were unclear (2016, p. 16), but its 2017 strategy highlighted “Brexit” in the first sentence of the Foreword, and reference to the implications for the NAO was made on eight separate occasions. The strategy identified managing the exit from the European Union as a “huge challenge for Whitehall,” and noted that public bodies, including the NAO, “have to operate in a highly fluid environment, preparing for a range of scenarios and implementation timescales.” It added, “As leaving the EU means more work for the civil service, it also means more work for us” (2017, p. 9). The NAO also stated that “Parliament expects our programme of work to reflect the importance it places on leaving the European Union” (2017, p. 13), adding that this will require it to be “ever more responsive and agile in our work,” and alive to the sensitivities during the negotiation phase. The NAO has stated that it wants to “make a positive contribution to the way we leave the EU,” including by reporting on preparations being made by departments. Thus, it sees these external developments as having an impact on the nature of its work and outputs. Another theme throughout the period is a recognition of the significance for NAO of ongoing public service reform. In 2013, NAO commented that “the civil service has no option but to think radically about how to provide public services” (2013, p. 4), and noted that government was “continuing with an ambitious programme of public sector reform.” These changes were seen as using innovative approaches, which brought both “opportunities and risks.” Given this, the NAO proposed to take an early look at significant reforms “to see whether risks have been clearly identified and well managed” (2013, p. 7). External changes were thus leading to a desire to examine developments early so that it could try to shape government responses. This and the need to examine radical reforms at a time of considerable uncertainty, and greater polarization and politicization of administration has led to NAO being more of a participant in a contentious debate.

Auditing in Changing Times 123 The ongoing debate about trust in institutions and experts has also inevitably had a relevance at several levels for the NAO as a body set up to play an expert function, working for parliament, and within the public sector – all areas subject to criticism in recent public debate. It is (and should be) set up to be an authoritative organization, and its role in providing parliament and the taxpayer with assurance about how public resources are used means that its credibility is essential. Its staff are required to make careful technical judgments and have access to sensitive information. The NAO’s close relationship with parliament and parliamentarians potentially means it could have been damaged by the 2009 expenses scandal, while its position as a public-sector body at a time when there has been criticism of monopoly providers has also been a risk for its reputation. On the other hand, the growing lack of trust in policy-makers may be a doubleedged sword for SAIs, which may be “tarred with the same brush,” but may alternatively be seen as a trusted source of analysis and insight as long as they maintain their independence.

Responses to the Changing Environment Given these perceived implications of environmental changes for the NAO, the strategy documents were also examined to identify responses. This review makes it clear that the organization is conscious that developments in the wider environment require it to change itself. In 2013, the NAO considered its work was likely to increase in profile because of the changes in the environment (2013, p. 5), and as a result, “we will be more responsive, reacting rapidly to emerging issues and stakeholder requests” (p. 15). Significantly, in that year, the NAO reported that it was reshaping itself organizationally to “match the strategic issues that departments face” (2013, p. 4). In 2017, it commented that its strategy described progress in “how we can best serve Parliament and respond to changes in the external environment that affect us and the bodies we audit” (p. 5). It also emphasized that although its “core strategic objectives endure, to serve Parliament fully we must innovate as we see new developments and challenges in the external environment” (2017, p. 8). A review of NAO strategies highlights a series of responses to the changing external environment. These relate to how it does its work, what work it does and the nature of its relationships with other parties. How the NAO Does Its Work A common theme across the five years is the suggestion that changes in the external environment require the NAO to work more quickly and adapt. For example, in 2013, the strategy stated that the NAO “must respond more quickly to stakeholder needs” (2013, p. 10) and promised to be “more responsive, reacting rapidly to emerging issues and stakeholder requests” (2013, p. 15). Similarly, the Foreword to the 2017 strategy states that NAO’s work needs to be “highly flexible and fast-moving” because of the demands of helping parliament

124 J. Lonsdale hold government to account over Brexit (2017, p. 4). Elsewhere, the same strategy states that “We must be more agile and responsive in providing analysis and scrutiny” (2017, p. 9), “especially given changes in our external environment … [by] … adapting our work to address the issues that matter to government and Parliament in the current environment” (2017, p. 11). Linked to this has been the NAO’s emphasis on assisting government in addition to fulfilling its role in supporting parliament, and thus going beyond traditional audit roles of helping to hold to account. In 2013, the NAO commented “We are working hard to go beyond simply showing that things may have gone wrong, to offering insights into why they have done so based on the spread of NAO experience across the whole of government” (2013, p. 4). In 2017, the NAO also stated “Alongside our primary responsibility toward Parliament, we are committed to driving improvement in the public sector” (2017, p. 4). One aspect of this has been what it refers to as “system-wide, integrated public audit,” which can “help to drive better public services for all” (2017, p. 6). The feasibility and appropriateness of fulfilling this dual role and the desire to participate more widely in assisting government has been challenged by some (Morin & Hazgui, 2016). In an article on “the dual identify of auditors at the UK National Audit Office,” the authors highlight the views of NAO auditors that the “singular role of guardian no longer suffices,” and they “draw meaning and usefulness from their role of monitoring the Administration if they believe they have contributed to improve public affairs management.” The authors consider this a “ ‘fantasized’ identity” which allows them to “develop a more positive self-concept of their own contribution to a better world.” (p. 585). A third NAO response has been to emphasize the importance of being topical, relevant, and timely. A traditional criticism of audit is of being backwardlooking and commenting after the event. Given the increased pace of external commentary on political events, including through social media, there is a risk of being seen as redundant should contributions take too long. As part of this the NAO has seen it is important to examine and report earlier in the policy process. In 2017, it noted “Our interventions now look at programs earlier in their lifecycle and at the planning stage as failure can become built in at an early stage of a project” (2017, p. 5). The challenge for the NAO here is to avoid commenting on policy or being seen to do so. There are other aspects to the idea of relevance, in particular, that audit work should be designed to be more responsive to the needs of its parliamentary audience, who themselves have become more sophisticated in their use of social media. The 2016 strategy commented that to increase its influence, the NAO was preparing “shorter reports and briefings tailored in length and style to the target audience, such as MPs and chairs of audit committees” (2016, p. 18). A fourth feature of the response has been an emphasis on exploiting more effectively NAO’s cross-government perspective. It has talked of the importance of developing “knowledge and insight to focus on the key risks driving systemic poor performance in government” (2013, p. 10), and emphasized that since public service providers faced many of the same issues, “Identifying and focusing on

Auditing in Changing Times 125 these shared strategic issues is integral to our new approach” (2013, p. 14). The NAO also commented in 2013 that its organizational “transformation programme” was designed to provide “a clearer focus on shared strategic issues across departments and public bodies” (2013, p. 13). The response was to highlight the value of benchmarking similar activities across government and share lessons from “good practice and failure” (2013, p. 14). A further impact of the changing environment on how the NAO’s carries out its work has been its stated desire to adapt its ways of working. This has included what it has described as working in a more integrated way and making more use of digital technology. The NAO’s 2016 strategy emphasized that to increase its influence it needed to extend its reach and make its work more accessible to a wider range of people. This included using digital communication including social media more extensively. What Work the NAO Does As well as how the NAO goes about its work, another big influence of the environment is the nature of the work that it produces and the outputs it publishes. Although the exact nature of the outputs from NAO is not prescribed, an examination of its value for money publications from 1984 onwards finds a fairly consistent and standardized format. Traditionally, they have been longform reports, consisting of a summary, main body and recommendations, incorporating graphics and tables. More recently, the NAO has highlighted diversity in its outputs as a response to changes in its environment. In 2013, the NAO stated “We will produce a wider range of products for different audiences,” the reason being to allow it “to respond quickly to complex, fast-moving public sector issues” (2013, p. 4). This was also prompted by the desire “to remain highly relevant to the most important challenges the public sector faces” (2014, p. 4), as well as “so we can tailor our work to different circumstances and audiences, and increase our influence” (2014, p. 13). In 2017, the NAO highlighted that, as well as value for money reports, it also prepared “public interest investigations, reports to management, published guidance and toolkits for audit committees and boards.” It also responded to correspondence from the public and members of parliament (2017, p. 12). For select committees, it prepared “Short Guides” summarizing the work of individual departments and audit work related to those organizations, and major “cross-government themes for the relevant committees” (2017, p. 17). The rationales behind preparing a wider range of products vary. The investigations are designed “to provide Parliament with timely and targeted reports on emerging issues.” In the same strategy, the NAO stated that its program of work included “cross-government comparison” in order “to identify systemic issues of wider importance in the public sector and spread best practice” (2017, p. 13). In 2014, the NAO explained that the “comparative and investigative work” was designed to “develop our understanding of the root causes of the failures in

126 J. Lonsdale public administration we observe” (2014, p. 4). One of the “toolkits” produced was “an interactive publication on commercial and contract management.” This was designed to “provide practitioners with insights on the new higher standard for government contracting.” This was followed by a seminar, jointly hosted with a government body (2017, p. 18). Three significant product developments associated with its performance audit work can be seen with investigations; “Short Guides”; and briefings related to Brexit (see Box 9.1). Although undertaken by performance auditors within an SAI, investigations and the Brexit briefings do not fit within the definitions of performance audit discussed in Chapter 2. Nevertheless, the NAO applies the same in-house quality standards and assurance arrangements as for its value for money reports. These are its set of 12 standards which are based on current best NAO practice and are consistent with the Fundamental Auditing Principles of the International

Box 9.1 Significant Product Developments Investigations: These pieces of work are designed to be shorter (3–4 months in duration, rather than 6–9 months for a VFM audit), delivered more quickly, and to be non-evaluative. The stated rationale was to set out the facts in a policy area or a topic of public controversy which seemed to require clarification or where there appeared to be concerns about the use of public funds. They were not intended to assess value for money. Amongst the 65 reports published by the NAO in 2017–2018 were 15 investigations. Many were the basis of hearings of the Committee of Public Accounts in the same way as performance audits. Short Guides – NAO’s position as auditor of government departments, agencies and other public bodies means that it gathers a lot of information on all aspects of departments. Its reports are designed for a non-specialist audience. In the light of this the NAO has taken to publishing “a suite” of short guides, one for each government department. The purpose of the guides, which are prepared in interactive slide form with links to NAO and government material, is to “provide a quick and accessible overview of the Department and focuses on what the Department does, how much it costs and recent and planned changes.” (National Audit Office, 2015). The guides are available to be used by the departmental select committees when they hold public hearings with senior officials or ministers on the departmental accounts. Brexit briefings and reports – The UK’s exit from the European Union has been a highly contentious subject. The NAO has stated that it “has an ongoing programme of work across government to examine how government is organising itself to deliver a successful exit from the EU” (2017 p. 5). The briefings were prepared in slide pack format, with analysis laid out in short, bullet-point style, rather than long form narrative. In 2017–2018, nine of the NAO’s 65 published reports related to preparedness for leaving the EU (National Audit Office 2018b).

Auditing in Changing Times 127 Standards of Supreme Audit Institutions (ISSAIs), tailored to meet the specific expectations and requirements of the UK public-sector environment and parliament. A second development has been in the wider focus of audits in terms of the organizations and subjects covered. As has been set in detail elsewhere, delivery of public services has ceased to be the sole responsibility of public-sector bodies (see, for example, Pollitt & Bouckaert, 2004), and UK Government is characterized by extensive outsourcing. As a result, auditors have increasingly been required to examine not just the performance of the contracting body – the department or agency – but also the contract itself. This has led to the performance of some major contracting organizations featuring in reports by the NAO, and their senior representatives appearing in front of the parliamentary Public Accounts Committee to answer questions in much the same way as civil servants. The pursuit of topicality has meant that auditors have also examined subjects even before policy has been implemented. This has required auditors to focus on the preparation for implementation, and the management of risks to delivery, in the absence of being able to comment on performance, and can be seen as a departure from audit’s primarily ex-post focus. A strong example was the NAO’s examination of the preparations for the 2012 Olympics in London, which started in 2007 and consisted of six separate reports looking at aspects of delivery including construction of the venues, planning for the Games and the costs, as well as planning for the “legacy.” In both cases – the widening of those caught within the audit gaze and the moving of the target of audit – audit attention has been on more contentious and sensitive material. An example has been ongoing work on the implementation of a new benefit – Universal Credit – which has been subject to development and debate for some years, and was rolled out from 2016 with the aim of transferring claimants to it from a number of other benefits by 2022. The subject has been highly controversial, with the Committee of Public Accounts calling management of the program “extraordinarily poor” in 2013 (Committee of Public Accounts, 2013) and a 2018 report resulting in political controversy when the responsible minister was considered to have misrepresented NAO findings (discussed in more detail in Chapter 7). In addition, examining commercial activities and individual deals, as well as the performance of private firms delivering public services has ensured that auditors have been reporting on commercially sensitive subjects, where published reports may affect share prices, reputations, and future contracts. This has tested the boundaries of commercial confidentiality in publicly funded contracts. Changing Nature of Relationships A third aspect of how the changing environment has affected how the NAO conducts its work has been the changing nature of its relationships with other parties. These relationships vary in nature and purpose. For example, the NAO

128 J. Lonsdale serves parliament, individual elected members and the parliamentary committees, to whom it is a supplier of objective audit information. These relationships appear to have become more varied and arguably more demanding, with NAO briefing departmental select committees more extensively than in the past, responding to individual members’ requests, and presenting to a wider range of specialist committees (e.g., treasury select committee on Brexit). At the same time, it is the appointed auditor of government bodies, usually under statute. Some of the latter relations – with government departments – have been long-running and unquestioned, whereas others – for example, with local government and the BBC – are more recent, and have required NAO to take over from former longstanding auditors. In local government, in particular, the NAO has entered a space previously occupied by the Audit Commission, and by private audit firms, and has needed to develop a new profile. Despite the formality of the auditor–auditee relationship with officials of various backgrounds, the NAO has stated regularly that it wishes to meet the needs of those it audits. This has traditionally meant members of parliament, and remains the case, with the NAO’s strategy for 2018–2019 emphasizing that it wants all MPs to understand its role and feel able to seek out its expertise. At the same time, one of its 2017 priorities was stated as “responding to feedback from public bodies … and deepening our engagement with those we audit” (2017, p. 5). It also commented that “We have ongoing and deepening relations with finance directors across local government.” (p. 12). Reference is also made to responding to an annual feedback survey of audited bodies. This has identified that bodies would value tailored cross-government outputs from the NAO (p. 18). Such developments suggest growing pressure to balance the tension between serving parliament, with the expectations of transparency and accountability, and assisting government bodies improve performance in less public ways.

Lessons for Evaluation and Internal Audit To conclude, this section draws out lessons from the experience of the NAO for both evaluation and internal audit. There is no doubt that notwithstanding its position of independence, the NAO cannot, and clearly does not, ignore the contemporary environment or environments in which it operates, and that these environments have a profound impact on the way in which it undertakes its work, what work it carries out, and the nature and range of relationships which it maintains. The evidence is that auditors do not see these external forces as a threat to independence and that they are surprisingly open and responsive. What then are the lessons for those undertaking other forms of audit work and evaluation? In the United Kingdom, all three forms of activity are subject to external pressures. Internal audit – as we have seen elsewhere – has been subject to growing demands to provide effective support to management to manage public funds effectively by developing better governance, risk management, and internal controls (Government Internal Audit Agency, 2019). Evaluation – whether commissioned from internal teams or from external contractors – is

Auditing in Changing Times 129 subject to the resource constraints within the public organizations commissioning such work. A first general lesson is that traditional, tried, and tested approaches need to be adapted, however much they appear to have been successful in the past. Scrutiny activity of any kind is not set in stone and for all the professional independence and ability to determine its own approaches, audit and evaluation are still highly shaped by the external environment. Without a clear rationale and audience (e.g., management for internal audit who consider the work of value in managing the organization), it is impossible to be able to justify evaluative activity at a time of scarce public resources and some suspicion about the value of large-scale data collection (evidenced, for example, by restrictions on data gathering in the health sector and growing attention to ethical standards). Independence may provide the opportunity to determine the manner and timing of the changes that take place within audit offices, but it is clear that no function (either in-house or contractor) relying on public funds can resist the need to respond to the outside world. This is equally true of internal audit. As the UK Government Internal Audit Agency (established in 2015 to undertake internal audit in over 100 government bodies) states “We take account of the wider government context, as well as the specific risks and management challenges facing the customer organisation” (Government Internal Audit Agency, 2019). A second lesson is that utility is crucial for all who evaluate the activities undertaken by others. A key part of this is the nature of the output from the work and the importance of some diversification to satisfy different needs. All producers of evaluative/audit material need to be flexible and find different ways of meeting audience needs and being useful to those at whom the work is targeted. In particular, the producers of evaluative work must recognize that different audiences have different needs, but in most cases, they will all be “time poor,” and therefore rely on authors to summarize detailed and complex material. Such synopses can be married with longer explanations, for example, for those who need to act on particular recommendations or are interested in the methodology and data used, but in general the message is that “less is more.” Peter Wilkins discusses in Chapter 8 the importance of recommendations and how the way in which they are worded can determine their utility. A third lesson is the importance of openness to the external environment. This means it is vital that evaluating organizations have the ability to scan their horizons and identify what developments are most likely to shape their world and that of the bodies they scrutinize. Hence, internal audit functions must be alive to external developments in counter-fraud work, or analytical techniques available as a result of the expansion of “big data.” Failure to be alert to changing needs and interests, or different expectations in terms of format or quality, can leave evaluative bodies vulnerable to becoming irrelevant and obsolete. This underlines the importance of relationships for all evaluative bodies, with a wide range of relevant peers, stakeholders and competitors through which it is possible to gain insight into emerging trends or identify threats and opportunities.

130 J. Lonsdale A fourth lesson is about the importance of credibility and of protecting and enhancing it as a matter of priority, particularly where such services are contracted out to external evaluators or where internal audit is not an in-house service. This relates to the quality of the work undertaken and the outputs produced, but also the relevance and timeliness of the work. Timeliness is essential for all evaluative work in order to meet a need and offer information. Work must be done to timescales that are considered appropriate by users rather than producers; a high-quality report is of no value if opportunities to contribute to policy development or influence the decisions to be taken by management are missed. All providers of evaluative material need to be mindful of topicality. Fifth, the cost-effectiveness of all forms of audit and evaluation is also important. There is clear evidence that evaluation commissioners within the public sector are under pressure and have smaller budgets for such work, even to the extent of being entirely unrealistic. Many government bodies do not have the resources to commission evaluations, which may reduce the quantity of work available, or if they have tight budgets, may make for a more prescriptive approach from commissioners. Or it may put pressure on more expensive commissioners to reduce their costs or their “offer.” Reduced budgets for internal audit may reduce the size of programs, and may require internal auditors in particular to justify their work in clear terms of the return from the investment in particular topics or be able to demonstrate how the most risky areas are being covered. In-house evaluators are often competing for internal resources with other functions. Both need to show value to management, users, clients, and commissioners, and demonstrate how the costs imposed on the organization and the value secured from the work is merited. Finally, all producers of evaluative information need to recognize that there is now a more competitive environment for analysis, knowledge, and insight. This is particularly pertinent for evaluators competing for work, and may be relevant for internal audit where work is externally sourced and competed for. This may have implications for the nature of the methods applied and may require all providers of information to be innovative in approach and keep abreast of changing professional practices. At the same time, stretched resources among those seeking advice may make them more receptive and increase the chance that evaluator bodies of all kinds will be listened to and have influence. The question then is whether, in the more testing environment, those receiving the advice have the opportunity to use it.

References Committee of Public Accounts (2007). Holding Government to Account: 150 years of the Committee of Public Accounts. Retrieved from www.parliament.uk/documents/ commons-committees/public-accounts/pac-history-booklet-pdf-version-p1.pdf. Committee of Public Accounts (2013). Universal Credit: Early Progress. 30th Report of Session 2013–14, House of Commons. Dewar, D., & Funnel, W. (2016). A History of British National Audit. Oxford, UK: Oxford University Press.

Auditing in Changing Times 131 Eatwell, R., & Goodwin, M. (2018). National Populism: The Revolt Against Liberal Democracy. London: Pelican. Government Internal Audit Agency (2019). Corporate Plan 2019–22. Retrieved from www.gov.uk. Hodge, M. (2016). Called to Account: How Corporate Bad Behaviour and Government Waste Combines to Cost us Millions. Boston, MA: Little Brown. Institute for Government (2014). Numbers Game: the latest Civil Service staff numbers. Retrieved from www.instituteforgovernment.org.uk/blog/numbers-game-latest-civil-servicestaff-numbers. King, A., & Crewe, I. (2013). The Blunders of our Governments. London: Oneworld. Lonsdale, J. (2013). Context and Accountability: Factors Shaping Performance Audit. In Pollitt, C. (Ed.), Context in Public Policy and Management: The Missing Link? Cheltenham, UK: Edward Elgar. Morin, D. (2008). Auditor General’s Universe Revisited. Managerial Auditing Journal, 28(7): 697–720. Morin, D., & Hazgui, M. (2016). We are Much More Than Watchdogs: The Dual Identity of Auditors at the National Audit Office. Journal of Accounting and Organisational Change, 12(4): 568–589. National Audit Office (2013). NAO Strategy 2014–15 to 2016–17. National Audit Office. National Audit Office (NAO) (2014). NAO Strategy 2015–16 to 2017–18. National Audit Office. National Audit Office (NAO) (2015). NAO Strategy 2016–17 to 2018–19. National Audit Office. National Audit Office (NAO) (2016). NAO Strategy 2017–18 to 2019–20. National Audit Office. National Audit Office (NAO) (2017). NAO Strategy 2018–19 to 2020–21. National Audit Office. National Audit Office (NAO) (2018a). Financial Sustainability of Local Authorities. National Audit Office. National Audit Office (NAO) (2018b). National Audit Office Annual Report and Accounts 2017–18. National Audit Office. Pollitt, C., Girre, X., Lonsdale, J., Mui, R., Summa, H. (1999). Performance or Compliance? Performance Audit and Public Management in Five Countries. Oxford, UK: Oxford University Press. Pollitt, C. (Ed.) (2013). Context in Public Policy and Management: The Missing Link? Cheltenham, UK: Edward Elgar. Pollitt, C., & Bouckaert, G. (2004). Public Management Reform: A Comparative Analysis. New Public Management, Governance and the Neo-Weberian State. Oxford, UK: Oxford University Press. Sharman, L. (2001). Holding to Account: The Review of Audit and Accountability for Central Government. HM Treasury. Retrieved from www.public-audit-forum.org.uk/ wp-content/uploads/2015/05/Holding-to-Account.pdf.

10 Understanding the Practice of Embedded Evaluation Opportunities and Challenges Christian van Stolk and Tom Ling

Evaluating interventions from a traditional view may suggest that evaluation is something that is done to an intervention, policy, service, or program. It implies a degree of distance and independence in allowing evaluators to undertake an episodic assessment of the efficiency or effectiveness of an intervention. The purpose of these “traditional” evaluations is to determine how inputs connect to activities and their outputs and outcomes. This approach is often used to hold those involved in the intervention to account. As shown in Chapter 2 this has been a very strong feature of the performance audit of effectiveness in particular (Dozois, Langlois, & Blanchet-Cohen, 2010, p. 6). However, the reality of delivering or implementing an intervention is often messy. Plans may change as the intervention develops. The intervention may take place in a complex environment with significant uncertainty. Information collected on the intervention may be contradictory. Traditional evaluation approaches and methods may not easily cope with this reality and as such may not be particularly helpful in such contexts. For instance, they may speak to the widespread failure of a set of interventions to achieve their stated goals without examining how and why the goals were changed. A relatively new school of thought, mostly grounded in the work of Quinn Patton (2008, 2010) on developmental evaluation, suggests a break with such traditional evaluation concepts and suggests an approach focused on learning, co-production, and adaptation. In this chapter, we look at the concept of embedded evaluation, which we see as distinct from traditional and developmental evaluation. After defining what we mean by embedded evaluation, we try to examine what some of the opportunities and challenges are of this approach. The concept of embedded evaluation implies a degree of connectivity whereby the evaluation activities are part of the ecosystem involved in the delivery and monitoring of an intervention. This may mean that this system moves between assessment and change in the delivery of the intervention. (Child Help Partnership Charity, 2018) The evaluator can be part of the project team that is putting an intervention in place or the evaluation activities are undertaken by those who are leading on the implementation or delivery of the intervention.

The Practice of Embedded Evaluation 133

Embedded Evaluation Defined An embedded evaluation in its most basic sense means a high degree of connectivity between the evaluator and the project team that manages the intervention. The evaluation activities are an integral part of the delivery and monitoring of an intervention. The embedded evaluation similar to other types of evaluations described earlier tends to be participatory and timely. It is focused on real-time learning that aims to improve and adapt the intervention while looking at the broader order environment in which the intervention takes place. The intervention in turn has to be adaptable and typically with no obvious beginning and end-points. Otherwise, the main purpose of being an embedded evaluation does not exist. The evaluation approach is similar to the real-time evaluation (RTE) but is adapted from existing and established frameworks to reflect the realities of being embedded in such complex situations. In this sense, they are overarching principles rather than rigid criteria. For instance, in a complex intervention, the evaluation may focus particularly on learning and as such an evaluation may focus less on establishing the summative effect of an intervention. In practice, embedded evaluation would always reflect their specific contexts such as health, humanitarian relief, or innovation. The intervention and its goals can be both simple and complex unlike developmental evaluation. In this way, this term and how we use it in this chapter is distinct from the concept of developmental evaluation where there is an assumption that the intervention subject to evaluation is complex and emergent. Stakeholder participation is important in all the types of evaluation discussed above. This implies a degree of stakeholder scrutiny but also improving the capacity of those affected by and involved in the intervention. A key feature of an embedded evaluation is also that it has a degree of capacity building for implementing organizations and their capacity to engage with evaluative activities. This may mean that a wider group of people affected by the intervention can undertake evaluation type activities, can hold those managing the intervention to account, or can form judgments on the acceptability from their point of view, the feasibility of fit with the existing institutions, and suitability or value for money of the intervention. It may also mean that those managing the intervention may take on more evaluation activities. This could involve data collection, workshops, and so on. Finally, embedded evaluation also implies a continuous and repetitive process of assessment. As long as the intervention exists, there is an expectation that evaluation activities take place that facilitate learning and aim to improve the intervention and its outcomes. There may be various purposes for undertaking an embedded evaluation. The evaluation may need to be timely, allow for the participation of wider stakeholders, have multiple feedback loops, promote learning within the intervention, and help shape the intervention. For embedded evaluation to make sense the underlying intervention will also need to be “adaptive,” in order to make changes or improvements in the intervention as learning takes place and information is collected.

134 C. van Stolk and T. Ling Links to Other Forms of Evaluation An embedded evaluation is a type of formative evaluation but it may also provide evidence of progress toward intended goals and highlight unintended outcomes. A formative evaluation is designed to help shape an intervention. It collects information as the intervention takes place and can provide useful inputs that can inform the further design and modification of the intervention. It typically tries to understand what has been achieved, how the process has worked, and takes the context into account (Health Foundation, 2015). This enables the evaluator to see how wider factors have contributed to the outcomes of an intervention. For instance in stabilization interventions focused on providing education opportunities for girls in a conflict zone, one may want to look at a wider range of contextual factors: the overall security of the region; military stabilization interventions running concurrently; support from the local community, the local educational infrastructure; and the capacity of actors on the ground to provide effective schooling (see e.g., van Stolk, Ling, Reading & Bassford, 2011). That said some of the information collected as part of an embedded evaluation may be useful for a summative evaluation as well. A summative evaluation looks more specifically at whether the stated goals have been achieved at the end of the intervention. Embedded evaluation overlaps with a number of other types of evaluation. If we look at the emphasis placed on timeliness and participation, the embedded evaluation is similar to a real-time evaluation. Embedded evaluation also focuses on wider stakeholder participation and intermediate outcomes but with a particular emphasis on understanding and using the tacit knowledge and lived experiences of implementers and service users. Real-time evaluation aims to provide immediate feedback to the intervention in a relatively short time-frame that may help to modify the intervention or the intermediate outcomes that have been achieved. It is often (but not exclusively) used in an emergency situation or in development contexts, often to bridge the gaps between monitoring and ex-post evaluations. Typically, they also have a strong element of stakeholder participation that can be used to place the intervention into context, to understand the impact on stakeholders or to promote stakeholder engagement in the intervention (Polastro, 2005). In humanitarian interventions, RTEs tries to get closer to two groups often omitted in ex-post evaluation, the field staff and beneficiaries (Herson & Mitchell, 2006). RTEs also tend to adapt existing evaluation criteria and frameworks. In humanitarian interventions, the evaluation criteria of the Development Assistance Committee (DAC) of the Organisation of Economic Co-operation and Development (OECD) are commonly used. RTEs do not apply a rigid set of criteria but use an evaluation framework in a given context. This means that certain criteria and questions may be focused on more. This in turn also spurs debate about reforming these frameworks and making them more suitable for real-world situations (Pasanen, 2018).

The Practice of Embedded Evaluation 135 Another type of evaluation that provides timely and ongoing feedback with a distinct purpose is a rapid cycle evaluation. They have a distinct purpose and are often used in health settings. These are also known as single-loop evaluations (Health Foundation, 2015). The goals of these interventions are often relatively fixed. So, the main area of interest is to understand how specific aspects of the intervention lead to a set of goals. In a healthcare setting, one can think about the goal of patient safety and reflect on a range of simple or complex interventions that aim to improve that goal. An application could be that this range of interventions are monitored and scored to understand how they can contribute to patient safety. The evaluation would then provide ongoing feedback to the interventions. A further type of evaluation is a developmental evaluation. Developmental evaluation supports real-time learning in complex and emergent situations (Dozois et al., 2010). This type of evaluation is often used in innovation settings where the goals of the intervention may be hard to define and it can be logically expected that the intervention may need to adapt over time. In this sense, we speak of two feedback loops. One feedback loop may refer to the assumptions and logic underlying the intervention, while the other loop may focus on adjusting and adapting the outputs and outcomes of an intervention (Patton, 2008). A developmental evaluation is typically relevant when examining a complex situation where there is high uncertainty. An example could be a new innovation where both the type of innovation and its intended use may be unclear. This occurs often in the field of innovation where an innovator designs a product, service, or an intervention for a particular purpose, but upon review a wider set of purposes can be identified and the feedback loops then lead to the re-design of the product, service, or intervention. This type of evaluation by its nature is exploratory and facilitates real-time learning among the stakeholders. It is accepted that there is a degree of fluidity which may lead to the adaptation of the intervention and goals over time (Patton, 2010). Formative evaluation is designed to help shape an intervention. It collects information as the intervention takes place and can provide useful inputs that can inform the further design and modification of the intervention. It typically tries to understand what has been achieved, how the process has worked, and takes the context into account (Health Foundation, 2015). This enables the evaluator to see how wider factors have contributed to the outcomes of an intervention. Embedded evaluation therefore draws upon a number of evaluation approaches. Like RTE and rapid-cycle evaluation it is conducted alongside the intervention and with an understanding of the timelines facing decision-makers. Like rapid evaluation it is often concerned to capture the tacit knowledge and lived experiences of implementing team and service users. Like formative evaluation it is informed by questions that are relevant to implementing teams and service users and it aims to support learning (even if assessing progress and performance is a part of this).

136 C. van Stolk and T. Ling Links to Audit As described in Chapters 2 and 3 both performance auditors and internal auditors follow audit standards and processes that set out objectives and criteria and determine how well they have been met in a linear process. As is the case for ex-post evaluation these linear processes adjust to changing objectives by resetting the starting point. The questions posed in the conduct of embedded evaluations reflect many of the same challenges faced by auditors of being timely in their work, using different approaches but maintaining independence of mind and of dealing with complex and ever-changing environments. Auditors are more constrained than evaluators as can be seen from the comparison between audit and evaluation in Chapter 5. Auditors, too, have experimented with providing different products as was discussed in Chapter 9. National audit offices, for example, provide testimony based on their experience and work, produce studies, advisory reports, and some cases evaluations but within an audit context. As an example, the National Audit Office of the UK prepared a report which provided parliament with an explanation of the measures taken to maintain the financial stability of the United Kingdom banking system in December 2009. The report was clear that it did not assess value for money since the time elapsed to make such an assessment was too short. However, given its mandate, knowledge, and experience in overseeing government operations, the auditors were available to provide parliament timely and important information on government actions that had been taken. Audit standards require maintenance of independence, usually interpreted as distance from the subject of an audit. However, there are examples where the objective of an audit requires the auditors to be much closer to a program while maintaining independence. For example, in the 1990s, the Office of the Auditor of Canada began a series of audits of technology projects under development in order to identify risks relative to best practices in project management and identify lessons learned and pitfalls to be avoided. (OAG, 2006, para. 3.5). These products had features of a formative evaluation under audit standards. Internal audit, like internal evaluation, is better positioned to innovate with real-time and embedded evaluations and audits. As discussed in Chapter 3, internal audit works under professional audit standards and guidance which also has provisions for doing consulting work. The standards provide the flexibility to do more real-time and embedded work. For example, an internal audit of the adequacy of internal controls in the conversion of a number of legacy systems into a single banking system included the attendance by the auditors at all the key meetings leading to a go-no-go decision. The audit was requested by management that wanted the input of the auditors before proceeding because of the risks associated with such a conversion (Barrados, 2019). This audit example comes closer to some of the features of an embedded evaluation as allowed for by the additional flexibilities in internal audit.

The Practice of Embedded Evaluation 137

Challenges and Opportunities In discussing the challenges and opportunities, we examine some of the key requirements of an embedded evaluation. Embedded Evaluation Has to Reflect the Timelines and Decision-Making Cycles of the Implementation This aspect implies that an evaluation should be considered at the outset of the intervention. This may be difficult, particularly in a time of crisis in certain situations. A 2013 volume looked at how evaluation was accommodated or not accommodated in emergency legislation in response to the terrorist attacks in the United States and the United Kingdom (UK) in the first decade of the 2000s (van Stolk & Fazekas, 2013). The question here was whether traditional evaluation approaches were fit for purpose in times of crisis. The conclusion was that it was difficult to accommodate evaluation in this context, though the UK appointed an independent reviewer who would report to the UK parliament on the use of emergency powers with a view of adjusting some of these powers if needed. The use of RTEs in humanitarian relief efforts suggests that evaluation processes can be accommodated in interventions at times of crisis. However, it speaks to the capacity of the organization involved in the intervention to plan for an evaluation and understand what an evaluation can and cannot achieve in a specific context. In other words, the purpose of an evaluation in a humanitarian context becomes more one of learning as the intervention unfolds and creating feedback loops to improve the intervention, rather than making a definitive assessment on the effectiveness of the intervention. In terms of performance audit, the National Audit Office, the supreme audit institution of the UK, wrote a series of interesting reports on policy responses in the aftermath of the financial crisis in 2009. This more real-time approach was a departure from the traditional performance audit approach geared to producing a value for money conclusion, which presented some risks to the Audit Office. Some of these risks included engaging in a more real-time discussion around the effectiveness of policy, as it was still being formulated and implemented. Over time, the NAO has managed to mitigate against this risk by eliminating the traditional “value for money” conclusion from some of its reports, and by focusing explicitly on the implementation of policy rather than policy-making in its reports. It has also expanded its range of outputs to include more real-time pieces of evaluation and investigation. In summary, there are some key considerations around the access to the intervention or whether evaluation can be accommodated in a timely fashion, the capacity to undertake embedded evaluation in an organization, and the purpose of an evaluation or whether the purpose of the evaluation is aligned with the context in which the intervention takes place (see also Dozois et al., 2010, p. 18). In terms of capacity, embedded evaluation may also require a set of skills beyond traditional evaluation skills such as communication, relationship management,

138 C. van Stolk and T. Ling strategic thinking, facilitation, and listening skills among others (Dozois et al., 2010, p. 24). However, others may argue that such skills should be part of the evaluator toolkit in any case. Aligning the evaluation approach to the context in embedded evaluation may also require the mapping of a learning process next to establishing an evaluation framework at the outset of the intervention. Embedded Evaluation Has to be Participatory The first issue relates to two interwoven questions. Who is the evaluator? Who participates in the evaluation and how? In terms of the first question, the traditional perception would be that the evaluator has to be external to the intervention to be independent. However, when we think about the skill set of those involved in an embedded evaluation, as mentioned in the previous paragraph, evaluation skills are clearly important but other skills may be more important (see also Patton, 2008). If learning is a true purpose of the evaluation, then that changes the shape of the evaluation process as well. Considerations around the effectiveness of the intervention and holding those managing the intervention to account may be more secondary goals. This raises the question of how important independence in an embedded evaluation is. It is our view that such independence of the evaluator remains important and this supposition relates to the second question of who participates in the evaluation and how? The perception among stakeholders that the process of learning and evaluation may not be open, transparent or impartial will likely undermine the embedded evaluation approach. Wider stakeholder management becomes crucially important in an embedded approach, and is an “art” in and by itself. This also speaks to managing and documenting group dynamics and building a wider capacity for evaluative thinking among stakeholders (Patton, 2008). The latter should create a sense of ownership, improve the understanding of findings and increase the likelihood that the findings will be used. The Intervention Has to be Adaptable and Promote Learning in the Light of New Findings A lot has been written about the rigidity of interventions in the evaluation literature. On the one hand, those evaluating the outcomes can insist on rigidity as part of the accountability process. In this way, interventions may not be allowed to deviate from the original plan and those implementing the interventions will be held to account on the basis of the business case. This can be difficult for complex or complicated interventions that take place in an environment characterized by significant uncertainty. Evaluation approaches themselves may not be helpful here. Theory of change approaches that produce logic models may be slightly abstract, overly linear, and sequential. They may pay too little attention to unintended consequences (van Stolk, Ling, Reding, & Bassford, 2011).

The Practice of Embedded Evaluation 139 Finally, they may find it difficult to accommodate feedback loops in dynamic environments (Coffey International Development, 2009). As such, these approaches may be useful in establishing the logic of an intervention and allowing for a baseline assessment. However, they may offer too little flexibility in specific contexts. We also observed that they are hardly used in very complex interventions with a high degree of uncertainty (van Stolk et al., 2011). There may be a variety of reasons for this. Evaluation may not be a priority in these situations. However, it may also reflect that the traditional theory of change approaches is not deemed that useful or easy to use in these contexts. There are a number of ways that one can mitigate against an approach that may be overly rigid. Most of these speak to participatory approaches and a degree of embeddedness. Examples are developmental evaluation as discussed earlier and contribution analysis. Contribution stories as coined by John Mayne seek to build a consensus among stakeholders in an intervention on the stated objectives, activities and expectations of those involved in stabilization interventions and those affected by the intervention (Mayne, 2008). A participatory approach is inherent in this approach. In addition, multiple feedback loops are present that allow for learning to take place and the stories to evolve. This also speaks to a real-time approach where an evaluator observes the intervention as it unfolds and a level of embeddedness of the evaluator in the wider ecosystem of the intervention. In this way, the evaluation approach as well as the intervention are ultimately adaptable and can interact with the context. The evaluation may also want to comment on this context or on the ecosystems in which the intervention sits. The latter can be particularly important in understanding why an intervention succeeds in one context and not the other, or comment on the transferability and scaling up of an intervention. Often, the key is to recognize contexts in which such approaches can be used or to do significant upfront planning to create a context in which wider stakeholders accept that both the intervention and evaluation approach may need to adapt as the intervention unfolds and learning takes place. Evaluation Has to be Continuous This seems another truism in formative evaluation. However, many traditional evaluation approaches take a very staged approach to evaluation. Many European Commission programs will have ex-ante, interim, and ex-post evaluations. These are governed by guidelines that may stipulate after exactly how many years a specific type of evaluation needs to take place. The guidelines may be specific about the evaluation criteria that need to be used and even to some extent what types of questions that the evaluation needs to answer. In this case, the evaluation approach is very much geared toward accountability. The main risk is that given the context of the program, the traditional evaluation approach outlined cannot provide unambiguous answers to the original questions asked. In certain cases, the evaluation may be set up to fail. The embedded evaluation may give some flexibility in terms of the approach.

140 C. van Stolk and T. Ling In an embedded evaluation, the aim is to promote more continuous or repetitive learning and feedback as the intervention unfolds. It may well be the case that there is no obvious before and after for the intervention, but rather a process of adaptation driven by an underlying set of goals. This is typically shared with, or involves, the stakeholders as explained earlier. We would expect at least one feedback loop whereby the evaluation assesses whether the intervention is fit for purpose to achieve the goals or objectives of the intervention. A further feedback loop could be to review the goals and objectives of an intervention as is typically done in a developmental evaluation. The challenges involved with an embedded evaluation appear typically at the start and the end. The evaluation and learning frameworks need to be designed at the same time as the intervention. This requires a different way of thinking about evaluation among the stakeholders. They need to see the usefulness of a more continuous approach and indeed embrace the feedback loop to the intervention. The evaluator may require a different set of skills as noted above compared to a more traditional approach. We have discussed these in the preceding paragraphs. This speaks to having the capacity and resources to undertake an evaluation of this type. An embedded evaluation may also require additional allocation of funding to evaluation beyond the 2–5% funding that is typically allocated to a traditional evaluation as a proportion of the total funding for an intervention. The main opportunity in using an embedded evaluation is to improve and shape the intervention as it unfolds. Part of this is to create a virtuous learning environment and to have learning as a major objective for the evaluation. Finally, the timeliness of the embedded evaluation may also work against it. If the evaluation concludes at the end of an intervention or a given point before the effects of the intervention are felt, it may be that the evaluation can comment little about the impact of an intervention or the sustainability of the impact, especially if the community that was created as part of the evaluation disbands and the evaluative capacity is lost.

Two Short Case Studies The two short case studies discussed below explore the issues of timeliness, participation, project adaptability, and evaluation continuity identified above as important dimensions in the design of embedded evaluation. In particular they show that these general characteristics can have different implications for the role of the evaluator as the design of evaluations in each case is necessarily tailored to the individual intervention, and the participatory aspects of the evaluation will be handled differently and with different challenges. We might compare this with the design of a randomized control trial (RCT), for example, which must conform to certain steps and standards and whose standards depend upon being independent of the views of participants. The two case studies are drawn from recent work conducted by RAND Europe. In each case, the work was commissioned as an independent evaluation

The Practice of Embedded Evaluation 141 but one where both the implementation team and the evaluation team recognized the value in working closely together. The way in which the evaluation was commissioned recognized that the evaluation should reflect the needs of decisionmakers and should contribute to meeting these needs. At the same time it was understood that the independence and rigor of data collection and analysis were fundamental to the value of the contribution that the evaluation might make. The specific issues they relate to are increasingly important and common: interventions and programs to support improvement, and for transformation in public services. Both involve interventions into complex systems with the conscious and deliberate aim of changing the way those improvement and innovation systems work. Changing how a system works, we suggest, involves different questions than changing how a good or service is delivered (a theme explored in Ling, 2014). They are therefore especially relevant to this discussion and, as will become clear, require an embedded evaluation approach.

Improvement: The Case of the Q Improvement Lab Q Lab is an initiative funded by the Health Foundation, a health research funder in the UK, with the (defensible) ambition to make a profound and lasting contribution to improving health and care. It is important to understand the context of the creation of the Q Lab approach, which emerged from the Q community, an initiative led by the Health Foundation and National Health Service (NHS) Improvement, that brings together a community of people with experience and expertise in improving health and care. The Lab was intended to provide a niche or protected space within which a range of stakeholders can engage creatively – and sometimes abrasively – in order to generate new insights and approaches that can then be tested and spread into the wider health and care system. The approach involves addressing just one issue at a time sequentially, and the first topic – of peer support – was completed in the summer of 2018 following a little over 12 months of work. RAND Europe provided an evaluation which was explicitly requested by the Health Foundation to be an embedded evaluation. In order to understand why an embedded approach was seen to be especially relevant it is important to understand a little about the nature of what was being evaluated. Q Lab aims to test an important proposition that improvement work in health and care can be enriched and deepened by a conscious and systematic inclusion of the ideas and experiences of those (including carers) with direct lived experience of a health and care issue, along with the experiences of advocates and professionals in the same field. Through sensitive and creative processes these experiences can be given practical expression in a set of proposals which both respect the authenticity of insights “from below” and can be seeded into a wider health and care system in ways that can potentially transform that system. Schot and Steinmueller (2016) have developed a relevant approach to explore the idea of “transformative innovation.” This suggests an effort to achieve a step change in the way that a problem or need is addressed. Although it focuses on

142 C. van Stolk and T. Ling innovation, it is also helpful in addressing an innovative effort to deliver a potentially transformative change in how the service itself is delivered. It involves starting with a deep understanding of the problem rather than with the existing way of doing things. Transforming a part of the improvement system is therefore characterized by an orientation to impact, rather than process, and involves engaging a wide range of end-users who are prepared to challenge, if necessary, the traditional way of doing things and question vested interests. The range and depth of those participating in the approach is therefore likely to be greater. It also widens the range of methods used to deliver change by comparison with more traditional quality improvement and could be expected to include a wider range of stakeholders engaged in challenging the existing way of doing things. The concept of transformational innovation is therefore highly relevant to a wide range of efforts to transform how we achieve public goals, not only in health care but also in very different areas such as environmental policy or addressing social inequalities. At the time of writing, the Q Lab in 2018–2019 has entered its second stage. The first pilot year of Q Lab (Liberati, Sim, Pollard, & Ling, 2018) provided a promising legacy of practical tools and lessons that can feed into future efforts to fulfill this ambitious mission. One of the most important responses to these lessons has been for the Q Lab team to collaborate to facilitate service users and advocates with a consistent presence of detailed knowledge of how services are currently experienced and what improvements are feasible and acceptable. In addition, the pilot year resulted in a growing body of experience in how to creatively engage a variety of groups and orient discussions toward potential improvements and innovations (Liberati et al., 2018). The Q Lab is a quintessential example of the kind of complex improvement system where links between cause and effect are difficult to make (van Winkelen, 2016). At the heart of the approach is a belief that by convening, stimulating, empowering, and mobilizing a wider range of stakeholders than is common in “traditional” quality improvement, the narratives and stories that people bring to the task begin to transform and cohere into new narratives of change – and ones that are hopefully oriented toward achieving real and lasting change in the health and care system. The evaluation should be interested not only in whether this happened but – to support learning and the development of the Q Lab approach – how this happened and how it could be improved. However, there is a challenge for an evaluator. How the Q Lab influences the wider health and care system (the ultimate aim of the initiative) will depend on which of many potential pathways are available through which the Q Lab model might either fail or fly. The evaluation needs to be able to identify such anticipated causal chains to impact and assess whether, and how far, these are effective. Frank Geels (2011) helpfully suggests four possible consequences for such consciously transformative approaches and these could be tested in a creative evaluation. First, the innovative idea might be exposed too early to the wider pressures of the health and care system. Those with system leverage modify the approach so that it may be translated and accommodated (or watered down). It may still contribute to wider learning and see some adaptation in the health and care system. Second, the improvement approach might be more developed and more resistant to

The Practice of Embedded Evaluation 143 external pressures and it has a more “symbiotic” relationship with the health and care system. In this case, it becomes a helpful “add-on” to local systems without changing its basic architecture. However, it might over time trigger further change that creates more radical shifts. Third, there might not only be a well-developed solution emerging (in this case from Q Lab activities) but also pressures and fissures in the wider system that create a window of opportunity. In practice, this window is likely to include patient demands, political pressures, funding availability, and cultural contestation. Finally, major pressures in the system might result in a de-alignment among existing activities and a major search for alternatives. In this case, the Q Lab approach is likely to be only one of a number of proposed “solutions” to this problem but, if it emerges as the preferred solution, it would have a system transforming effect. An embedded evaluation needs to not only understand if and how innovative ideas are being nurtured but also how these ideas are being consciously developed to deliver change in a context which might be characterized in any one of Geels’ four possible contexts. We suggest that an evaluation needs to understand how the Q Lab understands the context within which it is operating and how it mobilizes action to address this (and this requires, we suggest, careful attention to the mental models, metaphors, and sense-making of those involved). This process of surfacing the narratives and shared metaphors through the evaluation can also have the effect of helping to consolidate and agree these stories and or to precipitate disagreement and conflict. Consequently, and based on this particular case, we identify a set of principles to be adhered to in developing an evaluation approach to Q Lab. The evaluator participates and contributes to the decision-making of the implementation team but has a primary responsibility throughout to the rigor and independence of the evaluation. The evaluation, according to the principles of embedded evaluation, should: • • • • • •

Be highly participative Be based on a deep understanding of the developing narratives and mental models of participants Explore the liminal nature of the experience within the Lab, and at the boundary between the Lab and the systems it seeks to influence Developmentally explore assumptions with Q Lab team, participants and wider stakeholders Reflect how complex, difficult, and power-dependent is the move from Lab to real world Draw on principles of complexity and heterogeneity in “system re-design”

Transforming Primary Care in London Our second case study is about transformation in primary care. In 2017, RAND Europe, in partnership with Ernst and Young, were invited to deliver an evaluation of a program designed to transform primary care in London. The origins of the

144 C. van Stolk and T. Ling program lie in NHS England’s Call to Action in 2013 with the stated aim to transform primary care and address pressures on general practice (NHS, 2013). In early 2015, NHS England and London Clinical Commissioning Groups (CCGs), local groups that purchase services agreed to jointly fund a number of collaborative programs, which were badged as the Healthy London Partnership. Reflecting both national mandates and London-wide concerns, they were intended to support the implementation of the Five Year Forward View and Better Health for London (both published in October 2014). These documents set out the strategic direction for health services delivery in England and London respectively. One of these programs was the Transforming Primary Care (TPC) program. The TPC program was specifically designed to support the delivery of the Transforming Primary Care in London: Strategic Commissioning Framework (SCF) (published March 2015, NHS, 2015). This document set out a vision and set of standards for primary care that were to be delivered consistently across the capital. Later, the TPC program was adapted to support also the delivery of the General Practice Forward View (which was published in April 2016). The General Practice Forward View described a future for General Practice in England, which included extended (i.e., much larger) group practices. It committed significant resources (an extra £2.4 billion a year) to support general practice through (among other things) workforce development, improved infrastructure and technology, and the re-design of services. These are given a specific London dimension in TPC program which aims to deliver a new vision for primary care in London through work streams jointly agreed in late 2016 by CCGs and NHS England: • •

•

•

Access – Delivering extended and improved access across London, including pre-bookable and same-day access to a GP, 8am–8pm seven days per week. Infrastructure – Supporting investment in both Estates and Technology through the Estate and Technology Transformation Fund (ETTF), including the implementation of patient online, Virtual GP and online consultations. Resilience/Provider development – Providing support to general practice and at-scale providers and adapting ways of working in order to improve resilience and sustainability and move toward new models of care. Workforce – To increase the number of doctors in general practice by 703 in London as well as other recruitment and retention strategies, including additional clinical pharmacists, mental health therapists, and practice nurse development.

The evaluation team was commissioned to provide a formative evaluation of the TPC program, establishing findings to support the program to adapt to changing circumstances and evidence about the success or otherwise of efforts to date. From the outset, and reflecting willingness on the part of program stakeholders to learn and adapt, the evaluation was designed to meet the needs of the program and its

The Practice of Embedded Evaluation 145 stakeholders, to maximize the chance that findings and recommendations would be both relevant and feasible. For example, during the inception phase (June–August 2017), the evaluation team interviewed stakeholders, reviewed project documentation, and conducted a workshop with interested parties to discuss the theory of change underlying the program. Thereafter there were regular formal and informal interactions with the TPC program team to discuss the strategic direction of the evaluation. The evaluation team also collaborated with key stakeholders to support data collection in order to maximize response rates and user engagement with the evaluation. Following the inception phase, further data collection began in September 2017, to be completed in June 2018 with the final report delivered in October 2018. The evaluation focused on assessing the program’s contribution to the transformation of primary care. It was timely and decision-makers appeared to be engaged and interested in the results, with a view to using the findings to support a decision about the future direction of the program. As such it should be seen as a successfully “embedded but independent” evaluation. However, there are limitations that should be identified. First, the timing of the evaluation was designed to provide feedback to decision-makers in sufficient time to allow them to react and, if considered appropriate, adapt the program. However, this also meant that the data collection took place at a time when the program was establishing a capacity to deliver lasting change but before evidence of change in patient experience and outcomes, or improvements to system efficiency, could be perceived. There is a danger that too much focus on “timely” evaluations may have the unwelcome consequence that we know too little about the long-term effectiveness of such programs. Second, it is fully understood by all stakeholders that the program sits alongside and interacts with other national and local efforts to achieve the goals of primary care development (by STPs, CCGs, and general practice organizations themselves). Indeed, for those leading such a program, we often observed a juggling act to balance the competing and changing set of national requirements. Therefore precisely measuring the independent effect of the TPC program, as distinct from the other activities and programs and mandates within which it is nested, is impossible. By understanding what had been done, and with what consequences for different stakeholders, we were able to arrive at judgments concerning the added value of a pan-London approach but this comprised triangulating quantitative and qualitative evidence “in the round” and suggesting that the balance of evidence supported (we thought convincingly) a pan-London role.

Discussion Efforts to improve health care and health innovation systems are complex because they involve multiple aims, reflect multiple interests of patients, professionals, citizens, and corporations, and these interact in often unpredictable ways that change over time and this complexity has been referred to often by study participants. However, we also note that health systems are not only complex but also heterogeneous (or messy) and this has important implications for evaluation

146 C. van Stolk and T. Ling (i.e., the triad of simple, complicated, and complex can be an unhelpful way of defining interventions if heterogeneity is left out).

Conclusion Alongside viewing health innovation as a complex and sometimes messy system, another relevant frame of thinking about innovation that has been developing in the past decade is transformative innovation policy (Schot & Steinmueller, 2016). Transformative approaches are intended to address deep underlying challenges through radically reconfiguring services and behaviors to find new ways to address major challenges such as narrowing the gap between what is technically possible and what is delivered in practice through health care systems. This is not necessarily about speed but it is about overcoming deeply rooted path dependencies arising from institutional inertia, entrenched behaviors, and power imbalances. In this context, evaluations must attend to: • • • • •

The power of end-users to shape services and technologies The potential gap between the outputs of the current system and the outcomes or impacts needed (i.e., more efficiently produced outputs may be unhelpful) Understanding how behavior is changed in sustainable ways The multiple causes of impacts How power dynamics condition change

This relates to our key themes concerning embedded evaluation in several important ways. First, the importance of learning, co-production, and adaptation raised by Quinn Patton and others is complicated by the kinds of evaluations found in our two case studies. Whose learning should be nurtured and with whom should co-production of evaluative evidence be organized? An easy assumption is that this should be the leaders and managers of the intervention/ program and their implementation partners. However, if the ambition is truly transformational, it would make little sense to exclude the learning of end-users and not consider their ability to shape services and technologies. If producing the same outputs more efficiently would not of itself produce the intended transformative impact, what else is needed to identify a successful transformative approach? If there are multiple causes that could result in the impact intended, how confident can we be that the program being evaluated is the right way to deliver these impacts. And where does this place the evaluator in relation to the power dynamics conditioning change? There are no easy answers to these questions for embedded evaluators. However, we think the answers lie, if anywhere, with adopting an appropriately modest vision of the evaluators’ role and protecting the technical standards of evaluative inputs. The role of the evaluator is not to heroically solve all these problems. Indeed, a world which was determined only by evaluations would be a dull and defective one. Rather, in an embedded approach the evaluation should

The Practice of Embedded Evaluation 147 primarily be about learning and robust evidence that is collected and analyzed without fear or favor.

References Barrados, M. (2019). Discussion with Alterna Savings Credit Union. Ottawa, Canada, January 2019. Dozois, D., Langlois, M., & Blanchet-Cohen, N. (2010). DE201: A Practitioner’s Guide to Developmental Evaluation. Québec, Montreal, Victoria: The J.W. McConnell Family Foundation and the International Institute for Child Rights and Development. Retrieved from http://vibrantcanada.ca/files/development_evaluation_201_en.pdf. Child Help Partnership Charity (2018). Embedded Evaluation. Retrieved from www. childhelppartnership.org/for-professionals/learning-center-for-professionals/embeddedevaluation/ [Accessed August 2018]. Coffey International Development (2009). The Cost of a Good Deed: Project Monitoring in Post-Conflict Environments and the Application of the Logframe. Reading: Coffey International Limited. Geels F.W. (2011). The multi-Level Perspective on Sustainability Transitions: Responses to Seven Criticisms. Environ. Innov. Soc. Transit., 1(1) (2011): 24–40. Health Foundation (2015). Evaluation: Commonly Asked Questions About How to Approach Evaluation of Quality Improvement in Health Care. London: Health Foundation. Retrieved from www.health.org.uk/sites/health/files/EvaluationWhat ToConsider.pdf. Herson, M., & Mitchell, J. (2006). Real-Time Evaluation: Where Does Its Value Lie? London, UK: ODI. Retrieved from https://odihpn.org/magazine/real-time-evaluationwhere-does-its-value-lie/. Liberati, E., Sim, M.P.Y., Pollard, J., & Ling, T. (2018). Independent Evaluation of the Q Improvement Lab. Retrieved from https://preview.rand.org/pubs/research_reports/ RR2632.html. Ling, T. (2014). Achieving Equality at Scale Through System Transformation; Evaluating System Change. In K. Forss & M. Marra (Eds.), Speaking Justice to Power Ethical and Methodological Challenges for Evaluators Comparative Policy Evaluation, Volume 21. Piscataway, NJ: Transaction Press. Mayne, J. (2008). Contribution Analysis: An Approach to Exploring Cause and Effect. ILAC Brief 16. Retrieved from www.cgiar-ilac.org/content/contribution-analysisapproach-exploring-cause-and-effect. National Health Service (NHS) (2013). London Call to Action. Retrieved from www. england.nhs.uk/london/wp-content/uploads/sites/8/2013/12/london-call-to-action.pdf. National Health Service (NHS) (2015). Strategic Commissioning Framework. Retrieved from www.england.nhs.uk/london/wp-content/uploads/sites/8/2015/03/lndn-prim-care-doc.pdf. Office of the Auditor General of Canada (2006). Chapter 3 Large Information Technology Projects. In Report to Parliament Nov. 2006. Retrieved from www.oag-bvg.gc.ca/ internet/English/parl_oag_200611_03_e_14971.html#ch3hd3a. Pasanen, T. (2018). 2018: Time to update the DAC evaluation criteria? London: ODI. Retrieved from www.odi.org/comment/10594-2018-time-update-dac-evaluation-criteria. Polastro, R. (2005). Real Time Evaluations: Contributing to System-Wide Learning And Accountability. London: ODI. Retrieved from https://odihpn.org/magazine/real-timeevaluations-contributing-to-system-wide-learning-and-accountability/.

148 C. van Stolk and T. Ling Patton, M.Q. (2010). Developmental Evaluation: Applying Complexity Concepts to Enhance Innovation and Use. New York, NY: Guilford Press. Patton, M.Q. (2008). Utilization-Focused Evaluation. (4th ed). Thousand Oaks, CA: Sage Publications. Schot, J., & Steinmueller, E.W. (2016). Framing Innovation Policy for Transformation Change: Innovation Policy 3.0. Science Policy Research Unit. Retrieved from www. johanschot.com/wordpress/wp-content/uploads/2016/09/Framing-Innovation-Policy-forTransformative-Change-Innovation-Policy-3.0-2016.pdf. van Stolk, C., Ling, T., Reding, A., & Bassford, M. (2011). Monitoring and Evaluation in Stabilisation Interventions: Reviewing the State of the Art and Suggesting Ways forward. Santa Monica, CA: RAND Corporation. Retrieved from www.rand.org/pubs/ technical_reports/TR962.html. van Stolk, C., & Fazekas, M. (2013). How Evaluation Is Accommodated in Emergency Policy Making: A Comparison of Post-9/11 Emergency Legislation in the United Kingdom and the United States. In J.-E. Furubo, R.C. Rist, & S. Speer (Eds.), Evaluation and Turbulent Times: Reflections on a Discipline in Disarray (Comparative Policy Evaluation, Vol. 20, Chapter 9, pp. 161–177). New Brunswick, NJ: Transaction Publishers. van Winkelen C. (2016). Using Developmental Evaluation Methods With Communities of Practice. The Learning Organization, 23(2/3): 141–155. Retrieved from https:// doi.org/10.1108/TLO-08-2015-0047.

Part III

Practices Working Together

11 Conducting Evaluation in An Audit Agency Stephanie Shipman

Traditionally, performance audits and program evaluations differ in their purpose, scope, methods, standards, and relationship to program staff. In response to increased interest by the US Congress in the government’s achievement of results, the chief congressional oversight agency, the US Government Accountability Office (GAO), expanded its activities to include evaluations as well as audits in the 1970s. To support a full spectrum of congressional oversight activities, the GAO expanded its staff expertise and modified its policies and procedures to ensure that each study uses appropriate methods and that analyses and reporting are fact-based, non-partisan, non-ideological, fair, and balanced. This chapter aims to explain how the GAO accomplishes that.

Differences Between Performance Audits and Program Evaluations Performance audits typically examine agency operations to assess whether resources are spent efficiently, effectively, and according to law. (Financial audits, not addressed in this paper, operate under largely similar policies and procedures.) Performance audits also typically assess program activities’ compliance with criteria or standards based in law and regulation, agency, or governmentwide policy, or contract or grant terms. To ensure independence and objectivity – themes considered in earlier chapters – audits are conducted by a unit independent from the one managing the program, with limited involvement with program staff. In contrast, program evaluations focus on how well programs or policies are achieving their intended economic, health, and safety, or environmental outcomes. While the program’s intended outcomes may be outlined in law or regulation, they may also represent congressional or other stakeholder expectations based on the program design or logic – not requirements. Evaluations aim to assess the extent to which desired changes in outcomes have occurred, whether those changes can be attributed to program activities, or, by exploring the relationships between outcomes and program activities and context, evaluations may aim to learn why a program may or may not be achieving its intended outcomes.

152 S. Shipman Evaluation studies can be designed to answer a range of questions about programs to assist decision-making by program managers and policy-makers. Evaluations of federal programs are typically requested to provide external accountability for the use of public resources or to learn how to improve performance – or both. Evaluations of new programs might test the feasibility or effectiveness of an approach and be requested to inform decisions about program continuation or expansion. Evaluations of existing programs might assess how well the program is managed, whether resources are targeted to areas of greatest need, or whether the program as designed is, indeed, effective in resolving a problem or filling a need. Policy-makers could use the results to reallocate resources to other uses or revise the program’s design. Program evaluation as practiced in the US deploys a variety of methods and procedures, reflecting different ways of informing program management and policy-making. In the early years, the program evaluation field focused primarily on assessing the effectiveness of experimental programs designed to change social behavior such as increasing employment or reducing crime. Evaluators deployed classic experimental research designs in net impact evaluations to compare program outcomes with an estimate of what would have happened in the absence of the program. These research designs often involved randomly assigning program applicants either to a treatment group or to a nonparticipating control (or comparison) group in order to isolate program effects from the influences of other factors on the desired outcomes. In response to reports that some studies that failed to demonstrate changes in outcomes had experienced problems in implementing the new programs, evaluators began conducting implementation (or process) evaluations to assess the extent to which a new program was operating as it was intended. Such studies can potentially identify the reasons why a program is not achieving its intended objectives. Implementation evaluations resemble performance audits in assessing the extent to which program activities conform to statutory and regulatory requirements, program design, and professional standards or customer expectations. As a form of applied social science research, the evaluation field relies on a variety of social science methods drawn from a range of social science disciplines (primarily psychology, sociology, economics, political science, education, and public health). Designing and executing a net impact evaluation that permits attribution of outcomes to a policy or program requires scientific expertise in measurement and analysis methods to guard against drawing unjustified causal conclusions. Whereas auditing uses licensing and accreditation of individuals and institutions to ensure staff have appropriate expertise, evaluation in the US does not have individual licensing or accreditation but relies instead on evaluators’ demonstrated expertise in social science principles and methods, and application of general professional guidelines. The American Evaluation Association has established principles intended to guide the professional practice of evaluators, organized around the themes of systematic inquiry, competence, integrity and honesty, respect for people, and responsibility for general and public welfare (AEA, 2004). While auditing has

Conducting Evaluation in an Audit Agency 153 standard fact-checking procedures to ensure information reliability, quality assurance in evaluation relies primarily on the evaluator’s expertise or competence, methodological transparency in reporting, as well as expert peer review. Agency staff may conduct evaluations of the quality of a program’s operations or implementation to identify program improvements, but evaluations of program outcomes or effectiveness – like audits – are typically conducted by external parties to ensure the evaluation’s independence and objectivity. Some federal agencies have specialized research and evaluation offices, independent of program offices, that conduct or commission studies. The choice of whether to employ an external evaluator – a research institute, an independent consulting firm, or an independent oversight agency such as the GAO – may be based on the complexity of the subject, the availability of agency staff expertise or resources, or how important the evaluator’s independence from program management is to the credibility of the report. Program stakeholders typically play a more consultative role in the design of evaluations than they do in audits. Whether commissioned by the agency itself or an external party, the evaluator normally consults with the evaluation client to clarify the purpose and scope of the evaluation and gain agreement on the choice of evaluative criteria. This is done to ensure that the criteria are acceptable and credible to the clients, leading them to be more likely to accept and act on the evaluation’s conclusions and recommendations. The same conditions of independence apply to evaluators and auditors, including having control over the scope, methods, and criteria of the review; full access to agency data; and control over the findings, conclusions, and recommendations.

History of the GAO’s Adoption of Evaluation The GAO was initially established as an independent agency within the federal legislative branch and led by the Comptroller General by the 1921 Budget and Accounting Act to aid congressional oversight of federal financial management and broadly investigate how federal dollars are spent. In the 1960s several new federal programs were created as part of the “War on Poverty” to improve employment and social welfare among the poor. These policies, based on social science research and theory, were construed as “social experiments” accompanied by applied research studies to assess the success of those experiments. In 1974 the Congress broadened the GAO’s role to conduct evaluations of the economy, efficiency and effectiveness of federal programs and agency evaluation activities. In 1978 the Congress established an independent Office of the Inspector General (IG) in each of the major federal agencies to conduct audits and investigations of agency programs and operations, and make recommendations to the agency head and the Congress to promote agency economy, efficiency and effectiveness, and prevent waste, fraud and abuse. The IGs are to be appointed by the president without regard to political affiliation and solely on the basis of integrity and demonstrated ability in accounting, auditing, or related professions. Working independently but in coordination with the GAO, the IG offices conduct

154 S. Shipman audits and investigations in compliance with the Government Auditing Standards issued by the Comptroller General (GAO, 2011). The GAO primarily initiates reviews at the request of congressional committees or in response to a statutory mandate, although the Comptroller General has final authority over what work the agency undertakes. In the 1970s, as Congress began to pose questions to the GAO about the effectiveness of government programs and policies, the GAO expanded its analytic capabilities by hiring a large number of social scientists and establishing a new Institute for Program Evaluation (later, the Program Evaluation and Methodology Division). The institute’s charge was to conduct program evaluations and develop the capacity of GAO staff to conduct such studies through training and developing new agency guidance. By the mid-1990s the GAO had integrated assessments of program outcomes into much of its work and, as part of downsizing in response to significant budget cuts, closed the division and integrated the evaluation staff into the remaining divisions. Currently, the GAO maintains a centralized unit, the Applied Research and Methods team, with staff expertise in a wide variety of social and physical science research methods to assist studies throughout the agency. Over time the GAO has modified its policies and procedures to reflect its diverse accountability mission. The GAO currently conducts a very wide spectrum of studies. It evaluates how well government programs and policies are working, audits agency operations to determine whether agency funds are being spent efficiently and effectively, investigates allegations of illegal and improper activities, conducts policy analyses, and issues legal decisions and opinions concerning, for example, agency contract decisions. Currently, the GAO does not categorize its studies as either program evaluations or performance audits but categorizes the study’s research questions as evaluative or descriptive. All studies that use any evaluative criteria to make judgments about “how well a program is working” are considered performance audits; those that do not use evaluative criteria are considered descriptive studies. Research questions that aim to assess program effectiveness are required to use an appropriate net impact evaluation design and analysis to isolate the program’s effects from other influences on the program’s intended outcomes. Importantly, the generally-accepted Government Auditing Standards for performance audits – independence, competence, planning, risk-assessment, quality control, documentation, supervision, and reporting – continue to apply to all GAO performance audits. Management attention is focused on each study’s design to ensure that study methods are appropriate to the research question and properly executed.

GAO Policy and Procedures Support Evaluations Scope of Work Over time both internal GAO policy and US Government Auditing Standards were revised to expand the scope of work deemed appropriate for a government

Conducting Evaluation in an Audit Agency 155 audit and accountability agency. Where some public audit organizations may avoid assessing public policies for fear of appearing partisan, evaluators are expected to provide an opinion on whether a policy is working well or not, and how the policy’s effectiveness might be improved. In consultation with the national public accounting community, US generally-accepted Government Auditing Standards, commonly referred to as the “Yellow Book,” include program evaluation as a form of performance audit (GAO, 2011). The GAO does not have separate policies for the conduct of evaluations and audits, rather all studies drawing evaluative claims operate under the same policies and procedures for designing a study, monitoring its progress, and reviewing product quality. The level of senior management attention to a study is calibrated based on a risk-assessment of the study’s complexity, cost, and controversy. Projects designated low-risk deploy the same quality assurance procedures as those designated high-risk, but their oversight is delegated to division-level managers. In an organization-wide review of GAO policies during the 1990s, senior managers identified seven dimensions of quality that apply to both the process and products of the agency’s increasingly diverse work. Key among them was “context sophistication,” i.e., a thorough understanding of the social, political, and policy context surrounding a government program, its goals and beneficiaries, and primary stakeholders (both internal and external). A thorough understanding of the policy context is important for identifying appropriate research questions and concerns, selecting credible information sources and evaluative criteria, and considering the implications of study results. A key difference between evaluations and audits is in the choice of research question and evaluative criteria used to assess “how well” a program is operating. Research questions should be clear, objective, reflect the stated goals of the program, and permit objective measurement. Each congressional request is scrutinized by GAO management for its suitability and, if needed, referred to the requestor for further discussion (GAO, 2017a). For example, while audits generally accept the program design and requirements as a given, it is appropriate – and expected – for an evaluation to assess whether the current program design is well-matched to current or expected future conditions. In the design phase, the evaluator is expected to review statutes, program documents, prior GAO work, and the relevant policy literature to identify key policy issues, previous findings, and the potential decisions to which the evaluation results may contribute. This information helps agency managers assess potential risks involved in conducting the study. Assessment Criteria Scriven (1991) defined evaluation as the “determination of the merit, worth, or value of things” (p. vii). Weiss (1998) defined it more specifically as the “systematic assessment of the operation and/or the outcomes of a program or policy, compared to a set of explicit or implicit standards, as a means of contributing to the

156 S. Shipman improvement of the program or policy” (p. 4). Evaluation criteria or standards specify the expectations for the required or desired state regarding the program being evaluated. To support objective assessment, criteria must be directly observable and measurable events, actions, or characteristics that provide evidence that those expectations have been met. Evaluation criteria are drawn from a wider set of sources than criteria for compliance audits. Evaluation criteria may be drawn from law and regulation, legislative discussions of proposed legislation, federal government policy or agency standards, contract or grant terms, professional standards and principles, social science or economic principles, benchmarked performance standards, previous program performance levels, or best (or good) practices identified through research or expert opinion (Shipman, 2012). The selection of evaluation criteria is critical to the success and credibility of the evaluation. Some programs’ authorizing legislation explicitly states multiple goals or purposes for a program – some of which may be at odds with each other – reflecting the diverse concerns of legislators. In particular, evaluations often look for both intended and unintended effects precisely because opponents of a policy feared it might have undesirable effects. Context sophistication requires that the evaluator be aware of and on the lookout for these issues. The GAO generally does not make summative evaluations of a program’s value or worth because (a) a summative assessment requires a political decision to prioritize some components of performance (and criteria) over others, and (b) most program assessments are designed for a particular policy purpose, and thus the choice of criteria applies to the specific situation (Shipman, 2012). For example, consideration of whether or not to increase a program’s funding might warrant an assessment of unmet need and assurance that current program resources cannot be realistically reallocated to meet that need, rather than a comprehensive assessment of the program’s relevance, efficiency, and effectiveness. The GAO acknowledges the critical role of the thoughtful selection of evaluation criteria through investing significant time and management attention to their consideration and through requiring explicit discussion and a written agreement with the client. To ensure a fair and valid assessment, the evaluation client and the agency being evaluated should accept the validity of the criteria against which the program will be judged. Moreover, these parties’ rejection of the criteria’s validity will likely lead them to reject and fail to act on any eventual recommendations. GAO policies require the evaluation team to discuss the evaluation’s statement of work with the client and provide them with a copy in writing at the end of the study design phase. GAO policies also require the study purpose and design to be discussed in a formal entry conference with the agency (GAO, 2004). The entry conference also provides an opportunity for the agency to introduce information about the program’s context or the relevance and quality of proposed information sources. The GAO also considers program monitoring and evaluation to be a key component of prudent agency management, a component of an agency’s internal

Conducting Evaluation in an Audit Agency 157 controls required to ensure the proper and efficient use of public resources. Thus, assessing the quality of agency evaluations is considered an appropriate GAO task. The GAO is often required to do this type of work and draws on social science research methods as criteria for reviewing the quality of evaluation studies and on established professional guidance for the management and reporting of evaluations (for example, AEA, 2013). Methods To develop evidence on a broader range of program outcomes than are usually considered in audits, the GAO has expanded its range of methods and databases. To measure the perceived benefits of programs, changes in the behavior of program participants, or in economic, social or environmental conditions, the GAO conducts surveys, structured interviews, and focus groups, and relies on existing surveys of businesses or the population. For example, it has conducted surveys of public employees to obtain their perceptions of barriers to achieving desired program outcomes (GAO, 2014). Analysts make use of statistical analyses and modeling as well as syntheses of prior research and evaluations to draw evaluative conclusions. GAO analysts also apply social science and economic research principles and methods in reviewing the quality and credibility of research conducted by agencies and others in order to summarize the available research on a topic or to evaluate the quality of an agency’s own evaluation program. For example, a GAO study of a federal program that supports activities outside the school day to help improve student outcomes in high-poverty or lowperforming elementary schools used different methods to address distinct questions about the program (GAO, 2017b). First, staff conducted a 50-state survey of state program coordinators to learn how funds were awarded and used by states and localities. Second, staff reviewed a selection of state program evaluation studies and academic studies on student outcomes to identify what was known about the effectiveness of these programs. In addition, staff reviewed federal program documents and interviewed officials in four states to learn how performance information was used in managing the program. To assess program effectiveness, GAO social science research specialists conducted a comprehensive literature search and reviewed the identified studies for relevance to the topic, time-frame and sample, and for the strength of the research design or statistical methods used to account for other plausible influences on program outcomes. Conclusions about the effectiveness of specific types of funded activities were drawn only from the small number of relevant, high-quality evaluations identified, and the modest amount of high-quality research available was duly noted. The GAO obtains expertise in these varied research areas through both hiring and training. GAO employs a wide range of professional staff: social scientists, accountants, economists, computer scientists, and specialists in fields ranging from communications to transportation. Many of these specialists are currently

158 S. Shipman located in a central unit, the Applied Research and Methods team, whose staff assists projects throughout the agency, as needed, to meet the audit standard of having appropriate expertise on the audit team. At different times research specialists have been located centrally in the agency or in components within divisions; most are currently located in a central unit. The centralized arrangement helps support the standard application of research methods across the agency through direct supervision and, for example, producing brief guidance papers on the use of methods such as structured interviews, surveys, focus groups, case studies, and content analysis. To help staff meet auditing requirements for continuing professional education, the GAO both funds staff attendance at professional conferences and workshops and maintains an extensive Learning Center of its own. Many of the agency’s research specialists conduct brief courses or workshops on methods in the internal Learning Center. These courses introduce generalist analysts to specialized methods, explain under what circumstances they are appropriate and what techniques are used to ensure they produce adequate, sufficient evidence to meet auditing standards for independence and objectivity. These techniques generally involve explicit documentation of data collection and analysis steps to permit verification by an independent analyst. Deploying trainers who regularly work on GAO projects increases the training’s credibility and relevance. Professional Standards and Quality Assurance While the evaluation field in the US does not license evaluators, the primary US professional association, the American Evaluation Association, has issued guidelines for evaluation engagement, conduct, and reporting, as noted above. Recognizing that evaluations often employ data collection and analysis methods from disciplines other than auditing, the “Yellow Book” clarifies that studies conducted by government auditors must also conform to the professional standards or guidelines that pertain to the methods used in such studies, such as program evaluation and economics. As a government audit agency, the GAO continues to rely on fact-checking each publication’s facts and figures to ensure accuracy. Validation of analyses typically relies on review by another GAO analyst with relevant expertise who is external to the project team. Quality assurance is also obtained through concurrent supervisory review of planning, data collection and analysis, as well as the continuing professional education requirements typical of audit organizations. As the complexity of GAO’s work increased, a matrix form of project review was developed. At the beginning of a project, stakeholders within the agency are identified who have relevant expertise or knowledge to contribute, including lawyers, methodologists, and managers who have conduced related work. These stakeholders review and advise the team on the project’s design, its progress at key decision points, the analysis and conclusions, and the final product. For projects designated high-risk, draft reports are reviewed by the head of the agency, the Comptroller General.

Conducting Evaluation in an Audit Agency 159 External expert reviews may be sought for unusually complex analyses or where the organization lacks expertise internally. External advisory panels representing a diverse group of stakeholders may be brought in to review the design and reporting for studies of controversial topics to help ensure an objective and balanced treatment of the issues. In addition, to identify issues for congressional consideration on especially controversial or complex topics, the GAO holds forums to obtain the perspectives of a wide range of experts. These products undergo the same oversight policies and procedures to ensure balanced and objective reporting and the proper attribution of experts’ opinions. Relationship with the Evaluated Agency Audit organizations closely protect the auditor’s independence from management of the audited program, ensuring that the individuals on the audit team have no personal threats to their independence and that agency management has no undue influence over the scope of the audit, data collection, or the audit findings, conclusions or reporting. On questions of potential agency impropriety, the audit team may take on an oppositional role to the agency and program staff. This is antithetic to the evaluation discipline which more usually takes on a discovery or learning role. Evaluations may be perceived as threatening by program staff if they aim to judge the worth or effectiveness of the program – and, by extension, its staff. When commissioned by an external party such as the Congress to provide accountability for the use of federal funds, evaluations may appear threatening to program managers. In such cases, it is crucial to protect the evaluators’ independence so that unsatisfactory results can be honestly and openly reported. However, the evaluator’s relationship with the agency managers and staff depends very much on the purpose of the evaluation and how its results are expected to be used. For individual projects, evaluations are likely to be commissioned to assess the quality of management and success of the program, in order to inform a decision to continue funding the grantee. In this case, the relationship may be more adversarial. On the other hand, if the purpose of an evaluation of an existing program is to assess the program’s progress toward goals or continued relevance to current conditions, it may be intended to inform decisions about potential program management or design adjustments. In this learning context, the evaluator–agency manager relationship may be more collegial. Evaluators value early and continuing communication with program managers and staff both to improve their understanding of the program and its context, and to improve the credibility of the evaluation to program staff. Research has shown that explaining the evaluation purpose and design to program staff and keeping them apprised of interim results helps facilitate agency use of the evaluation results and recommendations (GAO, 2014). Thus, evaluators aim to consult with program staff on the issues, design, criteria, and data sources, but retain complete control over the analysis, and the reported conclusions and recommendations. The GAO maintains this same stance: consulting

160 S. Shipman agency managers and staff during design of the study, sharing, and discussing findings in a formal exit conference, and providing the agency with the opportunity to comment on the draft report and recommendations before publication (GAO, 2004). Because of its public accountability responsibility, the GAO reports publicly on all studies. Even when addressing security-sensitive topics, an unclassified version of the full report is generally made public.

Conclusion An audit organization can successfully expand its repertoire to include program evaluation by seeking expert guidance, hiring staff with appropriate expertise, providing staff with training and guidance on appropriate methods, and encouraging managers and analysts to consider diverse design options. Over time, as both the organization and its clients gain experience, managers will gain confidence in selecting the right method for the right question.

References American Evaluation Association (AEA) (2004). Guiding Principles for Evaluators. Fairhaven, MA. Retrieved from www.eval.org/p/cm/ld/fid=51. American Evaluation Association (2013). An Evaluation Roadmap for a More Effective Government. Fairhaven, MA. Retrieved from www.eval.org/evaluationroadmap. Scriven, M. (1991). Evaluation Thesaurus. Thousand Oaks, CA: Sage. Shipman, S. (2012). The Role of Context in Valuing. In G. Julnes (Ed.), Promoting Valuation in the Public Interest: Informing Policies for Judging Value in Evaluation. New Directions for Evaluation, 133: 53–63. US Government Accountability Office (GAO) (2004). GAO’s Agency Protocols. GAO05-35G. Washington, DC. Retrieved from www.gao.gov/products/GAO-05-35G. US Government Accountability Office (GAO) (2011). Government Auditing Standards. GAO-12-331G. Washington, DC. Note: The 2018 revision is effective for performance audits beginning on or after 1 July 2019. Retrieved from www.gao.gov/yellowbook/ overview. US Government Accountability Office (GAO) (2014). Program Evaluation: Some Agencies Reported That Networking, Hiring, and Involving Program Staff Help Build Capacity. GAO-15–25. Washington, DC. Retrieved from www.gao.gov/products/GAO-15-25. US Government Accountability Office (GAO) (2017a). GAO’s Congressional Protocols. GAO-17-767G. Washington, DC. Retrieved from www.gao.gov/products/GAO-17-767G. US Government Accountability Office (GAO) (2017b). K-12 Education: Education Needs to Improve Oversight of its 21st Century Program. GAO-17-400. Washington, DC. Retrieved from www.gao.gov/products/GAO-17-400. Weiss, C. H. (1998). Evaluation: Methods for Studying Programs and Policies (2nd ed.). Upper Saddle River, NJ: Prentice-Hall.

12 Two Sides of the Same Coin The UNESCO Example Susanne Frueh

The United Nations Education Culture and Communications Organization (UNESCO) was created following World War II with a mandate to contribute to peace and security by promoting collaboration among nations through education, science and culture in order to further universal respect for justice, for the rule of law and for the human rights and fundamental freedoms which are affirmed for the peoples of the world, without distinction of race, sex, language or religion, by the Charter of the United Nations. Most of UNESCO’s work is normative or standard-setting which is complemented by activities on the ground. Headquartered in Paris, the organization has 53 field offices and over 2,100 staff. UNESCO’s third line of defense of independent assurance is situated in the Internal Oversight Service (IOS) and entails three oversight functions: internal audit, investigations, and evaluation. Headed by a staff member at senior director level, IOS reports directly to the Director-General of UNESCO. Its functional independence is assured by the full authority to determine its work program, issue reports, submit reports to UNESCO’s executive board and a one-term appointment of six years with no renewal for the director. UNESCO also has an external auditor appointed by its executive board for a term of six years. The external auditor mandate consists in expressing every year an opinion and to issue a report on the financial statements of the organization. Articles 12.4 and 12.5 of UNESCO’s Financial Regulations state that the External Auditor may make observations with respect to the efficiency of the financial procedures, the accounting system, the internal financial controls and in general the administration and management of the Organization. The External Auditor is completely independent and solely responsible for the conduct of the audit. While not explicitly part of the external auditor’s mandate, the external auditor, at the request of UNESCO or the executive board, may perform performance audits. Over the years the extent to which performance audits are undertaken has varied.

162 S. Frueh The internal audit and evaluation functions used to be separate until 2000, when as part of an overall management reform it was decided to combine the internal audit and evaluation functions. At the time of the merger, the three-staff Evaluation Office was co-located with management (Strategic Planning) and thus did not have the same degree of independence as the internal audit function. With the creation of the Internal Oversight Service (IOS), the independence of evaluation and a direct reporting line through the Inspector General (director from 2009) to the Director-General of the organization were strengthened in line with the independence of the other oversight functions. This combined model exists currently in close to half of UN organizations while the remaining organizations have separate offices for the two functions. In 2014 UNESCO further strengthened the independence of IOS by setting a one-term mandate limit for its director.

Different Models for Location of Oversight Functions Exist Co-location in the 11 UN organizations following this model has led to different modalities and synergies depending on various factors (see Table 12.1). The combined location is in part due to a recommendation made by the UN’s Joint Inspection Unit (JIU) in 2006 following a review of the functioning of internal oversight in the UN system. The JIU had argued that this would provide greater flexibility and responsiveness, less overlap and better coordination, significant economies of scale and enhanced professionalism. The direct reporting line would free the internal oversight unit from control or undue influence from managers within the organization, increasing its independence and credibility. In all organizations, a senior staff member heads the two functions with competence in either audit or evaluation, reporting to the director of internal oversight. The director in most organizations reports to the executive head, thus to the highest level in the organization. Some co-located offices merely share the location function mostly independently from each other with separate work plans, reporting mechanisms and audiences while others have chosen a more integrated approach, mixing audit and evaluation staff to deliver joint products. Organizational size but also culture as well as the degree of investment into oversight appear to be key determinants for setting up a combined oversight function. Currently only one large UN organization, the UN Secretariat, has a combined oversight office – the Office of Internal Oversight (OIOS) – while in all other large UN organizations the Evaluation Office is a stand-alone office with a direct reporting line to senior management or directly to the governing board as is the case for the UN Development Programme. Over the years, there has been a lively debate in the UN system on the pros and cons of combining the functions. Some argue that the disciplines are very different from each other and have different methodologies, ways of working

Two Sides of the Same Coin 163 Table 12.1 The Positioning of Evaluation Offices in The UN System Positioning of the Evaluation Office

Small entity

Medium entity

Large entity

Total

%

Stand-alone Co-located with a management function Co-located in a combined oversight office with internal audit

2 2

3 1

6 0

11 3

44 12

5

4

2

11

44

9

8

8

25

100

Source: JIU, 2014 (updated 2018).1

and audiences, and that co-location could weaken performance. The evaluators, in particular, were wary about being mistaken for auditors and efforts of the respective professional UN networks, the UN Representatives of Internal Audit Services (UNRIAS), and the UN Evaluation Group (UNEG) have not led to a better understanding and collaboration between the disciplines, although at agency-level there have been varying degrees of collaboration based on co-location or individuals. Based on direct observations by the author, some of this was based on a lack of understanding about audit by the UN evaluation community, part was based on the perception that management received auditors differently than evaluators (i.e., they were more fearful). The professional self-perception of evaluators was that their work was more strategic and focused on improving program impact while the auditors were mainly compliance-focused and merely addressed what did not work rather than highlighting good performance and results. Auditors in turn have criticized the lack of professional certification and rigor of their evaluation colleagues. A 2014 JIU report on the UN’s evaluation functions sought to assess whether co-location had an impact on the maturity and performance of the evaluation function, based on the hypothesis that combined or co-located functions result in weaker evaluation function maturity. Most co-located functions were assessed at a below-average level of maturity, thus seemingly confirming the hypothesis that evaluation functions thrive more when they are stand-alone. However, the JIU report cautioned that neither size nor co-location precluded the development of a high-level function, as there were exceptions in both cases. What matters is the management and the investment in the evaluation function and this can vary under any of the models. The report found that the key issue or fear of co-location appears to be the perceived loss of identity and visibility of the evaluation function. It will therefore be interesting to see if the two small-sized organizations that have recently opted for co-location and that both have strong evaluation functions as assessed by the JIU in 2014, will be able to retain their level of maturity, identify, and visibility. Much in the view of the author depends on the leadership of the

164 S. Frueh f unctions. An experienced director (originating from either field) and an organization with a strong evaluation culture and a supportive governing board are key ingredients to success. Leadership and tone at the top are critical and are shared concerns by both functions. UNESCO was one of two UN entities that was assessed at a high level of maturity despite the co-location. In order to understand why UNESCO fared better than most of the co-located function it is important to understand the evolution of the functions in UNESCO.

Oversight Within the UNESCO Context UNESCO as a medium-size entity in the UN system had decided to consolidate its oversight functions in 2000 as part of overall management reform. The model chosen was an Inspector General model in order to ensure appropriate independence and to ensure a single direct reporting line to the executive head (see Figure 12.1). At the time of the merger into the Internal Oversight Service (IOS), evaluation was located in the program and policy service and did not interact much with the audit function. Audit was a stand-alone office reporting to the Director-General. The mandate of IOS includes evaluation, internal audit and investigations thus presenting a classic oversight office in the UN system. IOS advisory role is a crosscutting function that all three functions provide to varying degrees (e.g., evaluation may advise on the robustness of logical frameworks and results reporting while audit may advise on the appropriateness of fee setting). The Skills Sets of the Director and the Staff are Aligned with the Professional Disciplines The IOS is headed at a senior director level. The terms of reference for the position require expertise in one of the three main areas of the office, namely internal

Director

Head of Evaluation

Head of Internal Audit

Principal Investigator

4 Principal Evaluation Specialist 1 Associate Evaluation Specialist

3 Principal Auditors 2 Auditors 3 Associate Auditors

1 Associate Investigator

Internal Oversight Services

Figure 12.1 The Two Functions in UNESCO’s Context.

Two Sides of the Same Coin 165 audit, evaluation or investigation. The IOS Director reports directly to the Director-General of UNESCO and is an observer at meetings of the members of UNESCO’s senior management team (SMT). In all co-located functions in the United Nations, this position is currently exercised by a director with an audit background and (in most cases) with relevant audit certification with the exception of UNESCO where the director has an evaluation background. The choice of the director’s background may well influence the direction of a co-located service, as was also the case referred to earlier in this book in the case of the GAO as described in Chapter 11. Often, a director with an audit background will demand more rigor from evaluators, better process documentation, as well as a greater focus on efficiency; while a director with an evaluation background will aim at audits reaching beyond compliance and providing performance and results information. In some organizations, directors with an audit background have used the inclusion of evaluation to develop a joint team approach. Regarding the professional qualifications and background of the staff, the ten auditors in UNESCO all have an educational background in a discipline related to management and business administration. All but one junior staff have professional certifications as Certified Internal Auditors (CIA). The six evaluators and the director all have a master's or above degrees in social sciences but do not have academic degrees in evaluation nor certification as evaluators. This dichotomy appears to be similar in other UN organizations and reflects a preferred career path by management and administration graduates for the audit profession and that of social science graduates for the evaluation profession. The issue of professional certifications is one of the two key differences for the time being. While setting norms and standards for evaluation was achieved in 2006, the requirement for professional certification or academic degrees in evaluation are currently optional. What Did Co-Location Mean Within the UNESCO Context? The combination into one office under a senior manager in 2000 meant sharing an annual budget, a common front office and producing a joint annual report for UNESCO’s executive board. This also established direct reporting to the executive head and to the executive board on IOS work and representation as an observer on UNESCO’s senior management team. The dual reporting to the executive head and to the board also provided a strong degree of independence to the IOS, arguably one of the most independent services in the UN system at the time. The office itself was set up initially into two sections followed by a third section to cover increasing investigations needs in 2009. The combination of functions under one roof did not result in an integrated work plan as annual work plans were developed separately in line with established professional practices (see Box 12.1: the work plan for internal audit was risk-based while the work plan for evaluation sought to meet strategic performance information needs by UNESCO’s board and management).

166 S. Frueh Box 12.1 Professional Practice Areas Internal Audit Audits assess selected operations of Headquarters, Field Offices and information technology systems and make recommendations to improve the Organization’s administration, management control and programme delivery. Evaluation Evaluations assess the relevance, efficiency, effectiveness, impact and sustainability of programmes, projects and operations as well as their coherence, connectedness and coverage. Investigation Investigations assess allegations of misconduct and irregularities (e.g. fraud, abuse of assets, or harassment). It is the sole entity responsible for investigating misconduct. Advisory role Advisory services are provided to senior management upon request, ranging from organizational advice to operational guidance.

The role of the IOS, nevertheless, was to combine a compliance and assurance function (i.e., adherence to rules, principles, and procedures in audit and investigation) and a change management function (by conducting significant audits as well as evaluations) and prioritizing to help the organization mitigate its risk of not delivering effective results. With this shift, the internal oversight was increasingly perceived as a trusted partner of management rather than policing compliance. This change certainly produced positive effects and perceptions for all three functions. Conscious Co-Location Offers the Opportunity to Address the Two Sides of the Coin or the Entire Results Pyramid More Systematically Figure 12.2 illustrates the division of labor in the context of UNESCO by using a results pyramid: Internal auditors focus on the lower part of the results period and review inputs (human and financial), activities and outputs of business units, programmatic sectors, country offices, and, less often, programs. In doing so, internal auditors will also look at efficiency, effectiveness, and economy (performance audit dimensions) as appropriate. Evaluators in turn will look at mostly at the top part of the pyramid with greater emphasis on outputs, outcomes and eventual impact. This “division of labor” enables IOS to

Two Sides of the Same Coin 167

Impact Evaluation Outcome Output Activities Audit

Programmatic Administrative

$

Inputs

Figure 12.2 Evaluation and Audit-Combined Oversight Functions in UNESCO.

combine the findings of audits and evaluations to provide a more comprehensive picture of performance. Opportunities and Challenges of Co-Location The proximity or co-location of the two services and good working relationships between the teams have led to a greater cohesiveness between the two functions. Both functions are independent from management and form part of the UN’s third line of defense. Audit and evaluation colleagues have a first-hand opportunity to learn from each other and better understand the relative strengths. This understanding led to joint initiatives between the services to better leverage the work of the office and respond to information needs of management and the governing board. The three case studies explored later in this chapter illustrate the type of joint initiatives being undertaken. While each section plans and executes separately, it is at the level of the director of IOS that decisions are taken whether a request coming from management or the board lends itself more to an audit or to an evaluation approach. Depending on the information needs, audit or evaluation expertise may also be added to an assignment (e.g., an evaluator reviews relevance and effectiveness issues as part of an audit or an auditor reviews efficiency issues as part of an evaluation) ranging from simple professional exchange between the two sections to adding a staff or expert to the team. This gives great flexibility and allows the director to assess which discipline is better equipped to address a request, or if indeed there is a need for a combined exercise. This also applies to the scope where a joint exercise can ensure that the scope of both teams is well defined and, if indicated, complementary.

168 S. Frueh In this context, it has often become clear that the clients do not necessarily distinguish between the individual services, which is an opportunity but also a challenge given the different methodology, and professional practices followed by the two disciplines. In UNESCO, as elsewhere in the UN system, the audit section follows the standards of the International Internal Auditors while the evaluation section follows the standards set by the UN Evaluation Group (UNEG) Norms and Standards. While both disciplines follow similar pathways, it is mostly the differences in methodology, stakeholder engagement, reporting, and follow-up that can confuse clients while potentially creating opportunities for joint teamwork to strengthen each other’s work. Recent evolutions in the audit field have moved audit methodologies and stakeholder engagement closer toward evaluation approaches. For instance, internal audits increasingly undertake survey work or set up focus groups or reference groups to inform their work. Given the higher complexity of evaluation methods, the co-location has benefits for hands-on learning from “across the hall.” Another evolution in internal audit is the increased use of performance auditing, a trend that started with the Supreme Audit Institutes at national level. The evolution toward performance auditing took longer to reach audit offices in international organizations but is now increasingly incorporated as a technique to evaluate the economy, efficiency, and effectiveness of operations. For instance, a recent audit of official travel in UNESCO looked at compliance with existing policies but also assessed economy (value for money), efficiency, and effectiveness of established policies and internal controls. For a number of years Supreme Audit Institutions (SAIs) and the public sector have been moving from mostly compliance-oriented audit (as in the example above) toward performance audit or value for money. In a number of UN organizations, UNESCO included, external audits have added performance audits to their standard task of auditing the financial statement. There has been a mixed response by the UN’s management, with some embracing this while others pointed to the potential overlap with internal mechanisms. The Joint Inspection Unit took a critical view of this development and expressed concerns about mandate creep. Governing bodies, on the other hand, were keen to receive additional independent performance assessments to inform them on more than the accuracy of the financial statements. This interest mirrors that of parliaments in the countries these SAIs come from. This in turn has put pressure on internal audit to address performance dimensions in their audits and to increasingly reflect the three “E’s” (economy, efficiency, and effectiveness). This mostly entailed adding performance elements more systematically to compliance-focused audit work. In this regard, the division of labor as depicted in Figure 12.2 was respected. Rather than risking duplication with the evaluation section, the audit work plan has focused their performance audit approach on economy, efficiency, and effectiveness of business entities and business processes. In practical terms, IOS has been experimenting with five different types of complementary collaborative models, three of which are described in further detail as case studies (see Table 12.2).

Two Sides of the Same Coin 169 Table 12.2 Types of Collaboration in Four Case Studies Type of Collaboration

Collaboration Focus

Nature of Collaboration

Formative

Audit and evaluation methods and tools

Joint (Case Study 1)

Complementary compliance and performance assessment of UNESCO’s San Jose Field Office UNESCO’s StandardSetting Work of the Culture Sectors (evaluation) and how the Secretariat supports the working methods of the Cultural Conventions (audit)

On-going: Methodological support by the evaluation section on survey methods, focus group work etc.; support on efficiency questions, systems checks and data mining by the audit section Planned: joint planning and undertaking of field missions and joint reporting. Resulted in one report called “IOS Assessment of Field Office X”.

Parallel (Case Study 2)

Integrated (Case Study 3)

Separate

Planned: Started as an evaluation assignment; the client requested in addition an audit to cover the working methods. Some joint interviews but otherwise mainly parallel work. Resulted in one audit report and four evaluation reports; Joint reporting to UNESCO’s relevant governing bodies Evaluation of UNESCO’s Ad hoc: Started as an evaluation and role in education in then involved an audit team to look emergencies and at business processes. Resulted in protracted crisis and two separate reports with the Audit on the Framework evaluation report incorporating and Capacity for Support efficiency findings from the audit to crisis and Transition Response Audit complementing and Sequential: Requested by the following up on an evaluation team leader who came evaluation of the World across management issues during Water Assessment the evaluation. Agreement with IOS Programme that management to undertake a timely identified management audit. The work of the audit team issues was informed by insights from the evaluation team. Two reports.

Case Study 1: Joint Audit and Evaluation of a UNESCO Field Office (2012) The overall scope of the exercise covered the strategic position of the office: program and project management; contracting and procurement; financial compliance, including cost recovery, treasury controls, budget management and accounting; human resource management; publications and travel.

170 S. Frueh The two components of the joint exercise, audit and evaluation, were carried out respectively by two auditors and one evaluator from UNESCO’s Internal Oversight Service. The audit component assessed the functioning of the office including its internal controls, program implementation, reporting and compliance with UNESCO rules and procedures. The audit was performed in accordance with the International Standards for the Professional Practice of Internal Auditing. The evaluation component, conducted in accordance with established UNEG norms and standards for evaluations, examined the programmatic effectiveness and achievements of the office. Within the context of the review, program effectiveness was broadly defined as taking into account the following elements: • •

•

The effects generated by UNESCO activities, and the extent to which expected results have been achieved; The resources and implementation capacities as well as the implementation and delivery modalities in place to generate relevant outputs and contribute to the processes of chance (e.g., awareness-raising, capacity development and policy development); The strategic focus, alignment with country and stakeholder priorities, and adequacy of implementation and delivery modalities in the function of maximizing the impact of scarce UNESCO resources and capacities (see Table 12.3 for a summary of the methodology of audit and evaluation).

Table 12.3 Case Study 1 – Joint Study Methodology Methodology Evaluation

Audit

Interviews with programme staff, including international and local staff/consultants; interviews with key stakeholders, such as members of National Commissions, representatives of non-governmental organizations, governmental agencies and ministries, representatives of donor organizations, and representatives of other UN organizations present in the Cluster countries; review of documents and data, such as governing board documentation, internal financial and performance data and results reporting for both regular and extra-budgetary funding and project-specific documentation on activities and outputs for all the active sectors in the San Jose Office. Risk assessment conducted during the planning phase of the audit and substantive testing of a sample of projects and programme activities, contracts, travel and financial transactions. In doing this, the auditors examined relevant programme and transactional documentation and interviewed UNESCO personnel, both in Headquarters Sectors and Services and in the San Jose Office, as well as representatives of the National Commission for Costa Rica.

Two Sides of the Same Coin 171 Opportunities A combined exercise is seen as more efficient by field offices whose work during any audit or evaluation is affected. By combining them the time requirements and transaction costs for the office are reduced. Auditors and evaluators each have their comparative advantages that played out quite well in the combined field assessments. Auditors have a systematic methodology for looking at risks and controls, while evaluators are more skilled and experienced at conducting interviews, looking critically behind the official documentation and reporting and better at understanding the overall “performance and impact story.” Joint meetings with key stakeholders also presented an opportunity to save the time of partners but also allowed both teams to explore issues of interest to one or both. Thus, the teams get the same information at the same time and can reflect on what they heard at end of the day joint team meetings. This worked well in a number of the combined field assessments although one evaluator pointed out that it depended on the individuals and their willingness to engage. The title of the joint summary report reflects that the report was a hybrid combining traditional compliance-focused audit with an evaluative performance perspective thus closer to what could be called a performance audit. The report met the information needs of management, which is a critical factor in seeing recommendations fully implemented. At a time of limited resources adding a performance dimension to traditionally compliance-focused audits is cost-effective and value-added as this allows assessing an office on whether they are doing “the right things” rather than only whether they are doing things the “right way.” Challenges The time in the field given for the evaluator was shorter than for the two-person audit team, which stayed for two weeks. The evaluation terms of reference therefore rightly reflected a reduced scope of the evaluation but it was challenging to cover these during the short period of time on the ground. The resulting “assessment” falls short of an evaluation report as the evaluation results were integrated into the format of a standard audit report. Reflecting fully the professional standards of both disciplines in a joint product is difficult if not impossible. The different reporting writing styles also can be an issue as both disciplines express themselves differently and have also different definitions for issues such as “efficiency” and “effectiveness.” For this reason, other joint missions to field offices opted to produce separate reports.

Case Study 2: Sequential Exercise on UNESCO’s Culture Conventions (2012–2013) The evaluations of the standard-setting work related to four of the UNESCO culture conventions focused on assisting the governing bodies and the secretariat in the implementation of the conventions as well as the impact of some conventions at the

172 S. Frueh level of legislation/policy (degree of integration) of the State Parties who had signed on to the convention. Furthermore, the evaluation sought to determine the difference made by the standard-setting work (relevance and effectiveness), especially the support activities provided by UNESCO to parties to the conventions. During the planning phase of the evaluation in 2012, the issue of the working methods and their adequacy and efficiency was raised by UNESCO’s Culture Sector as a chief concern and requested coverage of these issues through an audit. The audit section, accepting that the working methods constituted a key risk for achieving the expected results of the convention work subsequently included the audit in their 2013 work plan. The audit added two additional conventions in order to cover the entire gamut of cultural conventions. The IOS audit of the working methods of six conventions in the field of culture was undertaken in order to assess the adequacy and efficiency of the working methods of UNESCO’s standard-setting work. The audit was performed in 2013 in accordance with the International Standards for the Professional Practice of Internal Auditing. The scope of the review included working methods of the convention secretariats, the funding arrangements and the meetings of the governing bodies. The methodology of the audit included data and information gathering through a review of convention texts, operational guidelines, rules of procedures as well as prior studies and reviews and key informant interviews. In addition, the audit reviewed funding and governance structures in a number of similar UN conventions hosted outside UNESCO for benchmarking purposes (see Table 12.4 for a summary of the methodology of audit and evaluation). While the two teams worked in parallel, some key stakeholder interviews were undertaken jointly and the two team leaders reviewed and commented on each other’s work. Both felt it was beneficial to have the other team leaders review and comment on the drafts as they had been part of the overall review process and participated in some of the same meetings. This helped to test some of the conclusions and in particular the recommendations (see Table 12.5). The work below Table 12.4 Case Study 2 – Parallel Study Methodology Methodology Evaluation

Audit

4 separate evaluations building on a theory of change reconstruction for each convention, ‘nested’ methodological design, which included purposive sampling and data collection at the different levels of the causal chain from ratification to implementation a as basis for acquiring credible data at all three levels. Data collection methods included a desk study, phone/Skype interviews, a survey, and in person interviews in a few selected countries. The method was adapted to the scope of for each evaluation. External technical expertise complemented the work of the IOS evaluation team. Data and information gathering through a review of convention texts, operational guidelines, rules of procedures as well as prior studies and reviews and key informant interviews.

Legend:

1954 Convention Protection of Cultural Property in the Event of Armed Conflict

Not covered

Evaluation partially covered

Evaluation fully covered

Audit fully covered

Policy level implementation (Evaluation) Implementation

Working Methods (Audit) Ratification (Evaluation)

Issue Covered

1970 Convention Means of Prohibiting & Preventing the Illicit Import, Export & Transfer of Ownership of Cultural Property

1972 Convention Protection of the World Cultural and Natural Heritage

2001 Convention Protection of the Underwater Cultural Heritage

Table 12.5 Evaluations and Audit Standard-Setting Work of the Culture Sector – Scope and Coverage 2003 Convention Safeguarding of the Intangible Cultural Heritage

2005 Convention Protection & Promotion of the Diversity of Cultural Expressions

174 S. Frueh illustrates how both exercises focused on different aspects with the audit covering the working methods for six conventions while the evaluation addressed ratification, policy-level implementation, and implementation of the conventions to varying degrees as expressed by the different shading below.

Opportunities By combining the two exercises, IOS managed to present a very holistic and integrated picture of what needs to be done. In a world of complexity and interconnections, looking only at one aspect of standard-setting work, would have been less useful. Given the complementarity of the work, a deliberate decision was taken by IOS to present the resulting report outcomes (four evaluation reports, one evaluation synthesis report, and one audit report) in one presentation to the governing bodies in January 2015. The information meeting for UNESCO member states was organized by the Culture Sector and included presentations by the Assistant Director-General for Culture and the director of IOS, which highlighted the importance the Culture Sector gave to the IOS report and its recommendations. Over the next two years, the reports were presented to the various convention meetings and for each the presentation of IOS was tailored to the audience. At times the presentation by IOS would focus more on the audit of the working methods, highlighting that the increasing workload at the convention secretariat with decreasing funding was unsustainable, that governing bodies could be made more efficient, that there is a need to review the cost structure of obtaining advisory services from outside providers, and recommended a common service platform across convention secretariats. Other times the presentation would focus mainly on the results of the evaluation of the standard-setting work and on crosscutting issues, highlighting the need for better integration of the provisions of the conventions in national policies, strategies, and legislation, the need to strengthen inter-sectoral work at the national level, and the need for improved monitoring at national and global levels. As a result, higher than usual visibility was given to all reports and the complementarity of the reports created a broader base for discussion of the State Parties on how to improve the efficiency and effectiveness of their work. It was notable that the meetings perceived the reports as “one product” and that there was little differentiation between the audit results or the evaluation result as both were considered together. Challenges The joint presentations, however, also presented a perception challenge – albeit one that is more important to the professional auditors and evaluators than to member states. The challenge consists of the possible blurring of the two in the minds of

Two Sides of the Same Coin 175 the clients whether management or the State Parties. In this instance, this was not significant as the governing bodies received simultaneously performance information on (i) the management of the convention secretariats (the audit) and (ii) on the effectiveness of the work of the conventions (the evaluation). Nevertheless, the lack of understanding of the two disciplines and their different approaches and working methods may lead to unrealistic demands and disappointment when future work does not meet the expectation and the information needs of the governing body. Linking the audit results and the evaluation results was not seamless. They were not coordinated and not executed at the same time and could have benefited from a stronger collaboration between the teams as was done for the third case study. This was partially mitigated by having one presentation on the outcome of both exercises to the governing body. Another challenge concerned the “ownership” of the recommendations as they were addressed to two different “client” groups: the audit recommendations were addressed to the UNESCO secretariat (thus had to be accepted and implemented by the secretariat) while the evaluation recommendations were mostly addressed to State Parties and to the secretariat. Follow-up was therefore fragmented. Given the absence of a standard management response for the State Parties, the implementation of the recommendations depended much on the leadership in the various conventions. From the IOS perspective, the collaboration on this important area of UNESCO was a success: it highlighted the value-added of internal audit studying governance aspects and business processes and complemented the work of the evaluation, which provided results information. The recommendations of both reports were eventually fully embraced within UNESCO and by the various governing bodies.

Case Study 3: Parallel but Integrated Exercise on UNESCO’s Role in Education in Emergencies and Protracted Crises (2016) This evaluation started as a stand-alone exercise in early 2016 to assess (1) the relevance and added value of UNESCO’s work in education in emergencies (strategic positioning), and (2) the efficiency and effectiveness of UNESCO’s participation in international emergency education coordination mechanisms. The evaluation was conducted in two phases with the first phase focusing on the production of four case studies (Afghanistan, South Sudan, Nepal, and Syrian response). One of the case studies (Afghanistan) was an in-depth impact evaluation (assessing the impact of literacy training on Afghan police) while the other three case studies were based on country visits, document review, and key stakeholder interviews and involved a less rigorous approach. The evaluation team was led by an IOS principal evaluator and included an external expert on emergency education as well as another IOS evaluator. Some

176 S. Frueh Table 12.6 Case Study 4 – Integrated Study Methodology Methodology Evaluation

Audit

2-phase approach: (1) data collection and analysis for the four case studies; desk studies of relevant documents and country visits for interviews with UNESCO staff and external partners and stakeholders, including representatives of coordination mechanisms in the field of education. (2) Mapping of UNESCO’s work in crisis-affected countries and territories, including a portfolio analysis of 10 crisis-affected countries; a review of the Organization’s participation in international coordination mechanisms in the field of education, which included interviews with representatives of organizations in New York and Geneva; an analysis of UNESCO’s strategic documents related to EiE work. 1-phase approach: Internal and external interviews, discussions, document data reviews, systems data collection and analysis, sampling of processes, review of HR data, review of financial data, comparison between planned implementation and actual implementation, review and practices of other UN agencies

months into the evaluation process, the evaluation team leader was faced with increasing questions about the efficiency of UNESCO’s response in emergency situations and whether the business processes of UNESCO were a major impediment to achieving results on the ground. Understanding administrative and procedural bottlenecks, however, is an expertise that requires internal understanding of business processes and ability to mine the data of in-house IT systems. Outsourcing to external consultants was not an option as they would not have been familiar with internal systems. The team leader therefore suggested adding an audit component with a clear task of assessing the overall organization-wide framework and capacities to support its response to crisis and transition situations and to identify risks of delays in each process related to the support to crisis and transition response (fund mobilization, fund release, and fund disbursement). Audit staff time was allocated to this ongoing assignment and the two teams started a semi-parallel, semi-integrated process. The audit tasks were added to the evaluation terms of reference and scope but the audit also developed its own approach in line with international standards. In terms of methodology, it is important to note that while different, the methodologies used by both teams were complementary (see Table 12.6 for a summary of the methodology used). Opportunities The joint work significantly strengthened the quality of both reports and their use and impact. Clients received a clear business process and efficiency analysis from the audit team that resulted in a number of action-oriented process and policy

Two Sides of the Same Coin 177 recommendations. At the same time the importance of the organizations’ work and some results were well documented, as were the challenges of working in difficult environments. The absence of a corporate policy on UNESCO’s work in emergencies was highlighted in both reports and led to further thinking on how to better position UNESCO in this area. Although this was potentially an area of overlap, this did not matter, as the recommendations were separate and thus did not duplicate the other report. Major opportunities also arose in the post report period. Joint communications and reporting to senior management and the executive board gave both products high visibility. Evaluation and audit team members attended the meetings of a task force that was set up while both exercises were still in progress in recognition of the urgency to address the emerging findings. The task force discussed how to fast track procedures for the crisis and transition response. Analysis of the audit and evaluation thus aimed to feed real-time into the work of these working groups and vice versa. The audit results were also better disseminated than usual as the results were reflected in the suite of communications materials developed by the Evaluation Office as a standard part of their evaluation dissemination strategy. Team members confirmed that the experience had been enriching as they learned from each other and learned to appreciate each other’s skills and inputs. Challenges Due to the unplanned nature of the engagement, the teams evolved at a different pace. A missed opportunity was that given the late engagement there was no joint objective setting and planning from the onset to better define the scope and divide the work. This created some challenges and led to the two teams working mostly in parallel rather than, as intended, in a combined way. Since the audit team was not part of the field mission they very much depended on the insights provided by the evaluation team. Joint discussions and meetings helped to a certain extent to bridge that gap and the field findings regarding delays and bottlenecks confirmed the audit findings at headquarters. In Paris the teams also joined up for interviews with each team asking questions relevant for them. Given the different sequencing of the case studies, the different pace of the audit team was not a major obstacle. Difficulties arose at the reporting drafting stage. The initial understanding had been that the work of the audit team would be incorporated as a separate chapter into the evaluation report. However, when the combined report was submitted the different reporting styles did not match up. Furthermore, there was some duplication in the audit chapter regarding the absence of an institutional strategy, which also had been identified as a key weakness by the evaluators and should have been raised earlier in the combined report. Based on the different report styles and key messages the IOS Director decided on a different reporting strategy: (1) a separate audit report reflecting IIA Standards with recommendations targeted at management and (2) an evaluation report that incorporated and referred to the relevant findings of the audit team but adjusting the presentation to the rest of the report. While this created some frustrations by the teams who had aimed at a joint report, the separation clearly worked better. An

178 S. Frueh integrated version would also have a somewhat complicated follow-up to the report as the current follow-up system between the two sections is undertaken differently. The audit section uses TeamMate, audit software to record all audit steps including follow-up while the Evaluation Office uses a manual excel spreadsheet process. The fact that there were two reports also facilitated use. Regarding the corporate processes, the auditors advised a task force set up immediately to address administrative bottlenecks. The evaluation recommended the development of a deliberate corporate strategy to prioritize education in emergencies. Both sets of recommendations require different actions, actors, and internal ownership. By splitting the reports, the implementation of the recommendations was more straightforward.

Overall Lessons Best Outcomes Are Achieved When Collaboration is Well Planned The two case studies reflect an ad-hoc collaboration that had not been planned. This impacted both exercises to different degrees and led to a different pace, expectations, pressure regarding other ongoing work, and reporting products. Most of this can be avoided if evaluators and auditors do joint annual planning at the beginning of the year to discuss any opportunities for a collaborative engagement in line with Figure 12.2 above. For UNESCO such joint planning discussions have now become a standard part of the annual work planning process. While the two disciplines have different planning processes, the risk-based approach taken by audit can be combined with a more opportunities-based approach taken by evaluation. Joint planning of oversight functions that are not under the direction should also become the norm rather than the exception. In most if not all United Nations entities the functions complain of limited resources and limited ability to address the risks and demands for their services. Greater collaboration and joint planning can help address some gaps while also reducing the number of reports and inputs reaching management. Joint Work Broadens and Strengthens the Methodological Toolbox of Both Teams and Builds Skills Auditors and evaluators have different skillsets and the methods used in both disciplines fully speak to this. At the same time, the audit profession has been evolving and the use of surveys, focus groups, and other methods are increasingly used by audit professionals. For any type of collaboration, the teams must discuss at the planning stage the most appropriate type of methodology to be used to answer the key questions. This may well result in the audit team member undertaking work that benefits the evaluation questions and vice versa. Recognizing that methodologically the two fields are getting closer it should

Two Sides of the Same Coin 179 also be acknowledged, however, that certain audit techniques (e.g., limited sampling, checklists, compliance lists) would seem of limited use for evaluations. A further discussion on what constitutes appropriate sampling size, key information interviewees, etc. is needed. Auditors can Help Evaluators to Better Tackle Efficiency Questions The Evaluation Office recognized its limited ability to address efficiency. Many of the evaluations, and this is not only the case for UNESCO, fail to fully address efficiency. This is in part because evaluators often do not have the time nor capacity to analyze efficiency factors and to dissect business and work processes. Joint work is therefore essential for evaluation reports to fully address efficiency. However, resource limitations will not permit this in a systematic manner. Evaluators therefore need to upgrade their skills profile or knowledge to better understand how to best address efficiency. Given the need for in-depth knowledge of data mining in corporate systems and corporate process knowledge, opportunities for training or incorporation of specialists (including audit colleagues) on evaluation teams should be considered for future evaluations. Alternatively, efficiency as a criterion may well not be included in an evaluation in particular if there is no capacity to address it. In such a case, if teams come across efficiency concerns they could recommend a separate audit. This was done in another case in UNESCO (see Table 12.2 on types of collaboration). Key in our view is to ensure dialog between the two functions so that at any stage of an evaluation or audit teams can consider how to strengthen each other’s work. Stakeholder Engagement Strengthens the Acceptance and Use of Findings Traditionally the two disciplines use different ways to engage stakeholders. Evaluators tend to reach out to broader groups of stakeholders (in the case of UNESCO this also includes member state representatives and stakeholders at country level) and increasingly set up consultative processes or groups that accompany the evaluation from the planning phase to the post reporting phase. The use of key reference stakeholder groups or reference groups enriches the exercise and leads to greater management acceptance and ownership of results as was the case in Case Study 2. The more collaborative approach of evaluations is therefore a great asset to be used and to be further explored by audits. Report Writing is Best Given to One Person in Order to Ensure Readability and Coherence The differences between both disciplines are also reflected in the style of reporting. Therefore, simply incorporating the audit report into the evaluation report is not advisable nor should it have been contemplated. Earlier joint efforts

180 S. Frueh (see Case Study 1) had concluded that it was best for the evaluator to draft the reports based on additional inputs by the audit team member. This is a practice that works well in particular if the combined reports are performance evaluations and/or performance audits. Joint Work and Reporting Enhances the Impact and Use of the Reports by Recipients In an international organization like UNESCO where the secretariat reports to a governing body in charge of organizational oversight and policy setting, the work of a combined oversight service needs to be strategically positioned. Combined work can lead to stronger and sharper results. It will help counter the perception often heard that there are too many oversight reports – combining reports or cross-analyzing findings and presenting these to governing boards maximizes reporting and impact. Targeted Communications Strategies are Import Tools for Follow-Up and Organizational Knowledge Management The fact that the dissemination of internal audit reports remains limited and within organizations is usually only addressed to the audited entity; there is no real effort to communicate results to broader constituencies and to input into organizational knowledge management. This is an area where evaluation practices have much evolved. There is a communications strategy for every evaluation and once the report has been finalized, there are different communications products that are developed (e.g., newsletter, summary reports, presentations, etc.). Given limited resources a joint communications function would benefit both offices. Audit needs to move out of its purely internal reporting mode and explore better ways of communicating its results.

Two Coins or Two-Sides-Of-One-Coin? The UNESCO example shows how the two disciplines can benefit from working with each other to the benefit of better reports and enhanced use and credibility. The two-sides-of-one coin examples given in the case studies clearly demonstrate the value-added of joined efforts. Joint work and reporting enhances the impact and use of reports by recipients. In an international organization like UNESCO where the secretariat reports to governing bodies in charge of the organization oversight and policy setting, the work of a combined oversight service needs to be strategically positioned. Combined work can lead to stronger and sharper results. It will help the perception often heard that there are too many oversight reports – combining reports or cross-analyzing findings maximizes the value of the reports and their impact. The UNESCO experience shows how the two disciplines can benefit from

Two Sides of the Same Coin 181 working with each other to the benefit of better reports and enhanced use and credibility. The degree of integration required, however, should not be standardized. We found that the various types of collaboration summarized in Table 12.2 work well and should be chosen depending on context and key issues to be addressed. However, it would be unrealistic to expect this to happen for every engagement and each opportunity for collaboration needs to be carefully planned, assessed and executed. Given the resource constraints of many oversight services, the two-coin approach is worth institutionalizing into planning processes as it can address different issues and questions in a more comprehensive manner. Evaluation and audit services would be well advised to increasingly consider their work as twosides-of-one coin even in organizations where they are not co-located. The example provided in Chapter 13 on the collaboration of evaluation and internal audit in UNDP to assess organizational effectiveness confirms that there is an appetite for this with the benefit of articulating a common, authoritative and stronger voice of internal oversight.

Notes 1 Since the publication of the JIU report, two additional small size organizations have decided to merge the functions while one large entity separated the functions. One co-located large entity moved the function to a stand-alone function. One new organization joined the UN system bringing with it an established combined function.

References Joint Inspection Unit of the United Nations System. (2006). Oversight Lacunae in the United Nations System. Retrieved from www.unjiu.org/content/reports. p. 9. Joint Inspection Unit of the United Nations System. (2010). JIU/REP/2010/5 The Audit Function in the UN System. Retrieved from www.unjiu.org/content/reports.

13 Lessons Learned from the Assessment of UNDP’s Institutional Effectiveness Jointly Conducted by the Independent Evaluation Office and the Office of Audit and Investigation of UNDP Indran Naidoo and Ana Soares Introduction The United National Development Programme (UNDP) is an intergovernmental organization on the ground in about 170 countries and territories working to eradicate poverty while protecting the planet by helping countries develop policies, skills, partnerships, and institutions so they can sustain their progress. In an effort to adapt the organization to a new context, to deliver higher-quality programs and achieve better results, institutional effectiveness became a central focus in the UNDP Strategic Plan for the period 2014–2017. In 2015 the Independent Evaluation Office (IEO) and the Office of Audit and Investigations (OAI) of UNDP agreed to embark on an innovative approach to oversight and jointly assess the institutional effectiveness of UNDP. A joint assessment implied working across traditional professional boundaries and developing synergies of approaches, rationalizing resources and producing a coherent message to the UNDP management and executive board on the critical subject of institutional effectiveness. With a team of 30 professionals, the IEO conducts independent evaluations at the country and corporate levels. OAI is a larger unit, with auditors in different regions, responsible for conducting internal audits, including performance audits, in addition to investigating allegations of fraud and corruption. The IEO and OAI are both independent units in the performance of their duties, but OAI reports to the UNDP Administrator, while the IEO reports to the Executive Board of Member States. The direct reporting line to the UNDP’s governing body provides to the IEO a high level of independence in the international oversight architecture. Given that evaluation in essence makes a judgment about past performance, as a precursor to making salient recommendations, the entrenchment of independence at all levels is vital. This includes structural, operational, financial and behavioral independence.

Lesson Learned 183 A special feature of the UNDP oversight architecture that also supports a more collaborative approach is the Audit and Evaluation Advisory Committee (AEAC), to which both oversight offices report directly on their work. The AEAC reports to the executive board and IEO’s administrator. Successor to the Audit Advisory Committee, the AEAC, whose mandate was extended in 2016 to include also the evaluation function, is comprised of eight independent experts, external to UNDP, appointed by the UNDP Administrator to advise him in fulfilling his responsibilities regarding the oversight. The IEO also benefits from the advice of the International Evaluation Advisory Panel (IEAP), with the direct reporting line to the IEO Director. The IEAP is an eminent group of experts and scholars appointed by the IEO Director to provide periodic advice on evaluation strategies, plans, methodologies, evaluation reports and other deliverables.

The Joint Undertaking Historically, OAI and IEO have worked according to their own plans – like most international audit and evaluation offices – with limited operational interaction. The joint effort to assess UNDP’s institutional effectiveness was a valuable learning experience that resulted in important reflection and increased interaction, and supported a practice of joint oversight work. Both offices, when possible, conduct missions together and exchange information, contributing to each other’s reports. But the only joint report produced so far has been the 2017 Joint Assessment of the Institutional Effectiveness of UNDP and there is no provision for another exercise to produce a joint report. The Joint Assessment was challenging at multiple stages, not the least of which was the apprehension by UNDP management about the exercise itself. Nevertheless, the directors of both offices, IEO and OAI, were convinced this joint approach would produce a stronger report and generate relevant lessons for the oversight offices and the organization. The directors’ strong leadership and commitment led to the joint report being concluded and validated, and successfully used for the ongoing reform initiatives of the organization. This chapter documents some of the challenges faced and the lessons gleaned by both oversight offices and the organization. It points to issues that offices need to consider when having audit and evaluation work together and to produce a joint report. The five key challenges discussed in this paper are: 1 2 3 4 5

Naming the “beast,” decision rights, communication channels, and protocols. Leveling understanding to assess evaluability and to define questions. Data collection methods, triangulation and agreeing on what constitutes credible evidence. Report writing and dealing with sensitive language to ensure independence, credibility and utility. The fear factor versus the enhanced credibility ripples.

184 I. Naidoo and A. Soares

Naming the “Beast,” Decision Rights, Communication Channels, and Protocols The first challenge that generated intense discussions between the two offices was what to name the “beast” – as we were calling it. It could not be called an evaluation, as this would not be acceptable to auditors. The term “review” was also suggested in the incipient stages of the exercise but it conveyed the impression that this exercise would be somewhat less rigorous, and could potentially dilute the impact of the final statements. After much discussion and negotiation between both offices, with evaluators and auditors citing literature about professional standards to support their views, it was found that, for such a study, the academic literature is limited and varied. There is an ongoing debate on what constitutes a review, an evaluation, a performance audit and an assessment (Chelimsky, 1991; Pollit et al., 1999; Bemelmans-Videc, 2003; Mayne, 2006). It was at this stage that the matter was escalated to the directors of both offices for their decisions to be made, and the term “Joint Assessment” was accepted by all. The time that it took to reach an agreement on the name of the exercise is the reflection of a much more significant challenge that needed to be addressed, which were decision rights, communication channels, and protocols. The gelling of the team required a building of consensus, as it proved very difficult for trained professionals, who have a vested interest in their professional norms and standards, to begin to negotiate them in an actual joint assessment context. Though building a team consensus took time, in the end there was mutual respect and understanding, with the view that it was fine to disagree agreeably on matters both parties considered to be issues of principle and issues of practice. The practical human element meant that members at times felt as if they were c ompeting against each other and not part of the same team. There were highly competent senior evaluators and auditors in the team, but, in the absence of a single leader with decision powers, which would have been ideal, most serious disagreements ended up having to be resolved at the level of the directorship, which resulted in some delays. Fortunately, it was all resolved due to the commitment of the directors, given the expectation of the board to see the exercise through. Not all decision-making processes and protocols were properly defined at the beginning of the joint assessment, and others simply did not work well or were not followed up for various reasons. This also led to delays and necessary compromises at different levels. Based on our experience, it was concluded that decision-making processes in joint exercises need to be clearly agreed and written down before work starts, and scrupulously followed by all parties during the implementation. Issues such as who has the final say on the report content, how will different perspectives be dealt with, what are the rules for determining what will be accepted as evidence, or not, must be clear.

Lesson Learned 185 Key Lesson Learned Management from both sides of joint exercises ought to invest in distinguishing the different and complementary roles to be played by audit and evaluation as well as sorting out decision rights, communication channels, and protocols for scoping, data collection, analysis, triangulation, structure of the report, report writing, peer-reviewing, and quality assurance. Shared decision-making does not always work effectively in joint exercises. One single team leader with sufficient authority and credibility among the auditors and evaluators is needed to ensure timely decisions, break ties, close discussions, and make sure deadlines are kept and quality products delivered.

Developing Common Understandings to Assess Evaluability and Define Key Questions The second significant challenge that arose was the need to get to a common understanding about the difference in principles of audit and evaluation as well as a difference in key terms, such as institutional effectiveness and Result Based Management (RBM). This was essential to properly design an assessment which both offices were comfortable with and which clarified the scope and assessment questions. Institutional effectiveness in UNDP, supporting the vision and development outcomes of the strategic plan, is associated with three key interrelated management results. First, higher-quality programs through RBM; second, UNDP is more open, agile and adaptable to harness knowledge, solutions and expertise; and third, improved management of human and financial resources. OAI had already done an RBM audit, which could contribute to the assessment of the quality of the program through RBM. However, the evaluators had concerns about the coverage of the audit results and therefore its findings had to be used with discernment. The evaluators were of the view that the assessment of program quality required the integration of the first pillar with the other two pillars that addressed knowledge management and resource management. It was not logical to assess each pillar separately without understanding how one affects the other. In the joint discussion about what constituted the pillars, and how they are linked together, it emerged that there were in fact very different understandings of RBM. Audit understood RBM to be much more aligned to reporting tools and practices for compliance, whereas evaluation views it as a strategy to both manage results and promote learning for institutional effectiveness. Evaluators questioned whether proper linkages between the pillars would in fact lead to an overall change in a much more critical way. An assessment concept note was developed describing the subject, rationale, purpose, objectives of the assessment, scope, methodology, and other basic aspects of the exercise, especially the questions to be asked. In the note’s development, the auditors wanted to focus on normative questions, while evaluators

186 I. Naidoo and A. Soares also wanted to address descriptive and cause and effect questions. These established professional inclinations made it difficult to quickly reach a consensus on scope and depth. Given the delays in the process, it was agreed that the focus would be on the first pillar of RBM, but evaluators would consider how the other pillars, especially knowledge management and resources, were affecting the first and vice versa. This led to key findings and conclusions indicating that UNDP continued to associate RBM more with compliance-driven practices to satisfy reporting requirements; with a limited focus on actual learning from the evidence to enhance decision-making and improved performance. Key Lesson Learned Developing a shared repository of operational concepts is critical in a joint exercise to create a commonly informed approach and methodology, thereby avoiding the inevitable contestation that arises when two disciplinary professions need to work together.

Data Collection Methods, Triangulation, and Agreeing on What Constitutes Credible Evidence The third key challenge faced was to agree on data collection methods, strategies for triangulation, and getting to a reasonable understanding of what would constitute credible evidence. This was a particular challenge that led to learning for both auditors and evaluators, and, through the process, contributed to the credibility of the outcome of the joint exercise. Evaluators learned from the auditors’ early approach of surveying staff. Evaluators at IEO generally favored using surveys more often to close gaps in triangulation later in the evaluation process, after desk review and data collection had been done. It used surveys to validate or refute initial perceptions from these sources. However, OAI used early surveys, crafted for different types of staff responsibility, as their main source of data collection. Having the surveys early in the process allowed the team to cover a lot of ground and better define the scope of the assessment to be more strategic. It also provided for focus only on additional data collection that still needed clarity, allowing more time for the why and the how and the so what questions. Nevertheless, the early surveys were only used to consult staff. Evaluators still needed to broaden consultation and reach out to the field to engage not only with staff but also partners and beneficiaries, something auditors did not do so often. Auditors focused more on data collection at headquarters and internally with staff. The OAI component lacked the financial resources in this exercise for country case studies. It should be recognized that the assessment was additional to the normal full work plan of OAI and IEO. Therefore, decisions had to be made to limit scope while ensuring coverage and depth for credibility, in consultation with the advisors for

Lesson Learned 187 both offices. To ensure adequate coverage, it was agreed to travel to the five regional hubs and bring together staff from ten countries from each region for two days of in-depth focus groups followed by consultation with partners and beneficiaries in the five countries where the focus groups were held. These interactions and consultations with multiple stakeholders proved particularly relevant for a full comprehension of the different contexts. They also allowed the team to close the triangulation of evidence and interestingly enough to refute a few preliminary findings that had been based only on headquarters’ consultation. Above all, these interactions were key to expanding the qualitative analysis for a better understanding of the reasons why things happened the way they did and what factors contributed or hindered the processes and outcomes. The result was a more robust assessment. Practice traditions of the two disciplines have put different emphases on types of evidence (Mayne, 2006). The evaluators placed an emphasis on the principle of triangulation, insisting on ample consultation with multiple stakeholders and visits for observations. The auditors put more weight on a desk review of documents, with limited interaction with stakeholders, mainly just at headquarters, because of their focus on analyzing compliance, risk management and governance (Pollitt et al., 1999). The best designed, most relevant programs cannot be delivered if the institutional governance, risk management and systems are not aligned to strategic objectives and if there is incompetence or a lack of integrity. For the evaluators, it is critical to fully consult with country offices, beneficiaries, and partners for adequate triangulation of evidence. Evaluation goes beyond the focus on institutional capability. It focuses on the effects and results – their relevance, effectiveness, and sustainability in achieving or contributing to outcomes. Audit examines the normative questions with a systematic, disciplined approach to assess and improve the effectiveness of compliance, risk management, control, and governance processes (Wisler, 1999). Its focus is on all management systems, processes and practices to assess and provide assurance on compliance, appropriate resource use and organizational competence, and integrity. Its reports are intended to tell management that systems and procedures are in compliance with the relevant policies, or to identify weaknesses that need to be addressed (Mayne, 2009). Evaluation can complement auditing by examining the how, the why and the so what, identifying factors influencing a particular outcome and what lessons can be extracted given particular contexts. It is the systematic collection and analysis of evidence from various sources (triangulation) about the outcomes of an institution, its partnerships and its programs and projects. Whereas compliance is the “result” for audit, it is the start of an accountability conversation for evaluators who also assess the learning component of why and how things that were or not accomplished happened (Lehtonen, 2005). If compliance is achieved, it does not necessarily lead to the intended outcomes and impacts. Evaluators do not presuppose that compliance leads to results given that compliance can be mechanistic and insufficient to diagnose deeper causal

188 I. Naidoo and A. Soares factors that may lead to a result and affect its sustainability and impact. The measures, which must be taken to comply, may at times even be incorrect or insufficient to promote the desired results. This was the case in UNDP with the excessive quality assurance and RBM focus on compliance, instead of the organization paying sufficient attention to learning and knowledge management and not just to report success, but to improve institutional effectiveness. Key Lesson Learned Auditing and evaluation have very different interests and approaches to methodology but can complement and strengthen each other’s analysis and contributions, particularly regarding results-based management. The enhanced triangulation of methods, extended consultation, and improved understanding of the context promoted by this joint approach led to a more robust assessment.

Report Writing and Dealing with Sensitive Language to Ensure Independence, Credibility, and Utility The fourth key challenge was writing the report and dealing with sensitive language to ensure independence, credibility, and utility of the assessment. Both independence and credibility are strongly linked to the concepts of legitimacy and quality of an assessment, and are keys to the use of the conclusions and recommendations in policy-making. There are many levels of independence; one key level to evaluators is the reporting line outside potential censorship of the evaluand, though, on its own, this does not guarantee independence (Piccioto, 2013). The debate regarding independence continues internationally, and is also linked to the question of evaluation models used; the contracting model, or use of consultants, has long been unchallenged and conveyed as a means to guarantee independence. It is for this reason that many offices outsource the work, though IEO has been strongly critical of this presumed guarantee that paid consultants are independent (Naidoo, 2012). The IEO of UNDP is an evaluator-led model, professionals working to the director, who himself has specific powers and reporting lines direct to the Executive Board and not the Administrator of UNDP, making it possible to talk truth to power. If one were to look at the independence question from the structural perspective and reporting line, it can be found in UNDP that the IEO reports to the governance body of the organization, the executive board, whereas the OAI reports to the administrator. The work of both units is overseen by the Audit and Evaluation Advisory Committee (AEAC) and each office has its own quality assurance mechanisms and structures, to guarantee professionalism and adherence to the respective norms and standards. They both thus enjoy credibility, and importantly access to the subject of investigation for assessment, which is UNDP as “evaluand.” The direct reporting of IEO to the executive board characterizes its particular necessity and obligations to be perceived as unbiased for

Lesson Learned 189 purposes of accessing stakeholders who have a very high sensitivity to the question of the independence of an evaluating unity. This has helped IEO achieve and maintain the credibility needed by emphasizing the fact that it is not under the administrator, and is consistent with the United Nations Group (UNEG) norms and standards. Audit in UNDP is not independent in the same way as the IEO, given the reporting line. However, while there is a difference at this level, one can conclude that for practical purposes – and the special reporting relationship of the directors to the AEAC – both functions enjoy credibility, due to their compliance with some or other independence provisions. It allows both offices to work without fear or favor, and reports emanating from them are signed in name by the directors without veto from UNDP management. Practically, there is a more robust critique that comes from the IEO which must continuously justify its independent stance, given the challenges it faces since most of its work is at the country level, very transparent and often contested. The change of name, from Evaluation Office to Independent Evaluation Office, helped, but most critically the 2016 Evaluation Policy (UNDP, 2016) embedded the independence element. The office has been renowned for its independent and credible stance, and work. It dedicated one of its biannual UNDP International Conferences on National Evaluation Capacities to the subject of Independence, Credibility and Use of Evaluations in 2013 (UNDP, 2014). It is also looked upon as an international yardstick for this element, even at the United Nations Evaluation Group level (UNEG, 2016, 2010). In terms of this work, there was less anxiety about the strong evaluative language from evaluators – sometimes perceived as being damaging to the organization – than auditors, who have a repository of language that is more coded to internal and financial control. In essence, the fact that the assessment was conducted jointly and signed off by both directors was appreciated by the board. They commented at various public sessions that they appreciated the robustness and frankness of the report, and that the leadership in such joint ventures pointed to the strength of the organization and mitigated against the normal experiential biases that both auditors and evaluators have. Another feature that contributed to the credibility of the study but also to some additional challenges had been the setting up of external reviews by both offices. The quality assurance process conducted by external reviewers in joint exercises such as this, should find a set of professionals able to serve both offices, to ensure a timely and consistent level and depth of involvement in the review process. Audit and evaluation external quality assurers came in at different moments not simultaneously, performed very differently in reviewing the joint assessment draft reports, at times even disagreeing with each other, challenging the revisions that the other reviewers had suggested, causing further delays. In the end, both offices allocated very skillful and highly regarded quality assurers, including a member of IEO’s International Evaluation Panel with many years of audit and evaluation experience. They certainly elevated the quality of

190 I. Naidoo and A. Soares the report. However, the management of the assurers was not ideal. They should have been selected by both offices in common agreement, in order not to give the impression they were biased toward one office or another. They should have reviewed the report together and their contributions been reviewed by the full team with changes on the report agreed upon all at once to prevent unnecessary disagreements and multiple changes to the report. Key Lesson Learned In view of the different perspectives, methods, and even results from the audit and evaluation approaches, the report structure of a joint exercise should have common sections but can most profit from also having separate evaluation and audit sections, findings, conclusions, and recommendations but with a common quality assurance process done by professionals with experience from audit and evaluation endorsed by both offices from the beginning.

The Fear Factor Versus the Enhanced Credibility Ripples The joint assessment, initially called “the joint audit and evaluation” by the rest of the UNDP, an even scarier name for the “beast,” caused significant fear among managers, especially at headquarters and regional bureaus and hubs. They saw in this exercise the potential for additional financial cuts at already very difficult financial times. Understandably, since there are always predicted and unpredicted implications to evaluations, where evaluation is valued, there are ripple effects (Patton, 2008). There was significant pressure against the exercise. Management first asked that the exercise be postponed, since an internal mid-term review had been recently conducted. Denied the request based on the need for an independent assessment, management then lobbied to have the scope reduced, claiming some initiatives were too recent to be assessed. As the assessment moved ahead, the team continued to face pushback, including refusal of interviews. Until the very last minute, when the report was first discussed with the executive board, management refused to attend the discussion. When the report was formally presented to the executive board in 2017, management still refused to transparently present a complete management response to the recommendation. Pushback to evaluations is generally normal, but never to this level. This was the first and only time the organization refused to properly engage and present a timely management response. Despite that, the assessment was evidence rich, highly acclaimed by the Executive Board of Member States and others inside and outside the organization, and received record downloads. In time, management produced a positive response to the recommendations and was swift to implement changes. The new administration moved quickly to validate the assessment and committed to implement the recommendation of the report. The administrator in discussions

Lesson Learned 191 with the executive board cited the assessment positively and assured the executive board that it would be taken into the strategic plan. The executive board asked for an update of the assessment in the new cycle of its interactions with IEO. The new UNDP Strategic Plan accepted all the findings, conclusions, and recommendations of the report. It gave special focus to the recommendations on building a solid organizational results culture, results-based budgeting, program quality, and enhanced value for money. Fear was associated with many aspects, including the risk of losing resources, but foremost the possibility of damaging future changes and the reputation of UNDP by highlighting certain failures. As a result, the report made a strong statement for transparency in the UN. The unflinching approach by both directors not to be distracted and to remain true to the evidence of the comprehensive assessment enhanced the credibility of these oversight offices. It was an encouraging shift within the UN system, calling leadership to promote a results culture that encourages critical reflection of success and failures as having value for organizational learning to improve effectiveness and stimulate innovation. Key Lesson Learned Despite the fear factor and the other challenges noted, having audit and evaluation conduct this assessment together and present one joint report attracted much attention, enhanced the credibility of the outcome, and trust in the results’ rigor. It was above all a strong statement for transparency, accountability, and learning in the UN.

Concluding Remarks for Further Consideration While there were challenges, this experience of conducting a joint assessment with elements from audit and evaluation was a unique learning opportunity for UNDP. It helped evaluators to look more closely at quantitative analysis and normative questions and helped auditors to look at performance from the cause and effect angle of people and results. Audit and evaluation complementing each other can better feed the oversight, accountability, and learning engines of an organization to ensure effectiveness. What followed the joint exercise was an avalanche of questions about the advantages and disadvantages of joint audit and evaluation exercises. The offices have yet to better reflect on these questions, but for now it is worth considering that it is imperative that audit and evaluation work together in some ways to at least better coordinate work plans and ensure synergies, avoid duplication of effort and waste of resources. While evaluators and auditors have their unique ways of working, there are areas that they can work on collectively and this already began at UNDP with joint audit and evaluation missions, especially at the country level. After all, “audit and evaluation are part of a continuum” (Barret, 2001). The autonomy of the audit and evaluation functions is both historical in nature and institutional in form, but it is clear that the nature of the responsibilities of international organizations like UNDP are changing with the new Sustainable

192 I. Naidoo and A. Soares Development Goals and a recognition that the status quo of working in silos is no longer appropriate (Naidoo & Soares, 2017). The question is not whether to integrate these functions, but rather, how. Something having been asked for many years. The marriage may be arranged, and the partners may not be enamored of one another, but pressures to make it work are strong. Divorce does not even seem to be an option; pressures are building to constrain spending more effectively, manage public resources better, select appropriate policy tools, and then implement successfully. (Rist, 1993) This chapter hopes to have added a few aspects for consideration on how to make this integration more effective.

References Barret, P. (2001). Evaluation and Performance Auditing: Sharing the Common Ground, presented at Australasian Evaluation Society – International Conference. Bemelmans-Videc, M.L. (2003). Auditing and Evaluating Collaborative Government: The Role of Supreme Audit Institutions. In A. Gray, B. Jenkings, F. Leeuw, & J. Mayne (Eds.), Collaboration in Public Services: The challenge for Evaluation. New Brunswick, NJ: Transaction Publishers. Chelimsky, E. (1991). Comparing and Contrasting Auditing and Evaluation: Some Notes on their Relationship. In A. Friedberg, B. Geist, N. Mizrahi, & I. Sharkansky (Eds.), State Audit and Accountability: A Book of Readings. Jerusalem: State Audit Office. Lehtonen, M. (2005). Accounting (f)or Learning? Evaluation, 11(2): 169–188. Mayne, J. (2006). Audit and Evaluation in Public Management: Challenges, Reforms and Different Roles. Canadian Journal of Program Evaluation, 21(1): 11–45. Patton, M.Q. (2008). Utilization-Focused Evaluation. Thousand Oaks, CA: Sage. Naidoo, I., & Soares, A.R. (2017). Incorporating the Sustainable Development Goals in National Evaluation Capacity Development. In R.D. van den Berg, I. Naidoo, & S.D. Tamondong (Eds.), Evaluation for Agenda 2030: Providing Evidence on Progress and Sustainability. Exeter, UK: IDEAS. Naidoo, I. (2012). Evaluations for Development Effectiveness: Ensuring Credibility Through Independence, Transparency and Utility? Guest lecture, IPDET, 2012. Retrieved from www.ipdet.org/files/2012%20Guest%20Lectures/Naidoo_Guest_Lecture.pdf. Picciotto, R. (2013). Evaluation Independence in Organizations, Journal of Multidisciplinary Evaluation (JMDE), 9(20): 18–32. Pollitt, C., Girre, X., Lonsdale, J., Mul, R., Summa, H., & Waerness, M. (1999). Performance or Compliance? Performance Audit and Public Management in Five Countries. Oxford, UK: Oxford University Press. Rist, R. (1993). Foreword. In A. Gray, B. Jenkins, & B. Segsworth, Budgeting, Auditing & Evaluation. Functions & Integration in Seven Governments. New Brunswick, NJ: Transaction Publishers. UNDP (2014). Solutions Related to Challenges of Independence, Credibility and Use of Evaluation. Proceedings from the Third International Conference on National Evaluation

Lesson Learned 193 Capacities. New York, NY: UNDP. Retrieved from http://web.undp.org/evaluation/ documents/NEC/NEC-proceedings-2013.pdf. UNDP (2016). The UNDP Evaluation Policy. DP/2016/23, adopted 19 July 2016. Retrieved from http://web.undp.org/evaluation/documents/policy/2016/Evaluation_policy_ EN_2016.pdf. UNEG (2010). Good Practice Guidelines for Follow-up to Evaluations, United Nations Evaluation Group. Retrieved from www.uneval.org/document/detail/610. UNEG (2016). Norms and Standards for Evaluation, United Nations Evaluation Group. Retrieved from www.uneval.org/document/detail/1914. Wisler, C. (1999). Evaluation and Auditing: Prospects for Convergence. iNew Directions in Evaluation, 71: 1–71.

14 Reflections on Opportunities and Challenges in Evaluation in the Development Banks Arne Paulson

Introduction In the chapter I try to explain to readers who may be unfamiliar with Multilateral Development Banks (MDBs) what their purpose is, what some of the main concerns are in those institutions regarding audit and evaluation, and in the case of the Inter-American Development Bank (IDB), offer a more detailed look at the evaluation function as it operated while I was there (1980–2007).1 The chapter closes with a few final thoughts and suggestions for MDBs to enhance the role of borrowers in developing their own evaluation culture.

The Role of MDBs In order to bring some order to the socio-economic and political chaos that prevailed in Western Europe and other independent countries after World War II, a number of international institutions were created in a variety of fields. In the financial realm, this included the creation of the Bretton Woods institutions – the International Monetary Fund (IMF) and the International Bank for Reconstruction and Development (the IBRD or World Bank). Both institutions were based on the principle that pooled financial resources and commitments by member governments created strength and stability that was greater than any one member could muster on its own. The IMF was designed to support the short-term balance of payments and currency stability issues, while the World Bank was designed to support longer-term investment projects aimed at improving socio-economic conditions in borrower countries. In 1944, the World Bank thus became the world’s first multilateral development bank. Focusing initially on infrastructure, the World Bank made its first loan (to France) in 1947, then in the 1950s and 1960s, expanded its geographical scope to include Latin America, Asia, and newly independent countries of Africa. And it expanded the scope of its activities to include not just physical infrastructure, but also development projects with a greater immediate impact on people, including investments in health, education, urban and rural development. But the resources of the World Bank were not sufficient to meet all the needs of the growing number of developing countries clamoring for long-term financing

Opportunities and Challenges 195 of public investment projects, and the years that followed saw the creation of Regional MDBs, including the Inter-American Development Bank (IDB) in 1959, the African Development Bank (AfDB) in 1964, the Asian Development Bank (AsDB) in 1966, and the European Bank for Reconstruction and Development (EBRD) in 1991. Sub-regional development banks, such as the Caribbean Development Bank (1970), were established to assist states (e.g., newly independent Caribbean island nations) that were too small to justify the costs of membership in the larger regional banks. And still other MDBs, such as the Central American Bank for Economic Integration (1960), the European Investment Bank (1959), the Andean Development Corporation (1970), the Islamic Development Bank (1975), the North American Development Bank (1994), and more recently the New Development Bank (2015) and the Asian Infrastructure Investment Bank (2016) were created to strengthen sub-regional economic co-operation among their members, largely independent of the influence of non-regional countries. By 2016, total annual lending (disbursements) by these institutions totaled about $65.8 billion (UN Inter-Agency Task force on Financing for Development). Like the World Bank, these other multilateral banks were able to capitalize on the pooled commitments of their member countries, which typically included financially stronger countries such as the United States, Canada, European countries (and eventually Japan and China) in addition to the “borrowing” members. This allowed those institutions to sell bonds at low-interest rates on international markets to support their lending to developing country members. In addition to maintaining a “regional focus” that reflected regional priorities, the purpose of the regional MDBs was to raise additional long-term (and low-cost) capital for worthwhile socio-economic development projects – above and beyond what their borrowers could hope to obtain from the World Bank or on their own account. Over time, while the mandate of the MDBs continued to focus on promoting social and economic development in their member countries, their tools evolved from financing infrastructure projects to a broader menu of products and services, to better respond to client demand. For example, most MDBs began to offer technical and financial assistance not directly tied to investment projects, but rather in the form of new lending tools such as policy-based loans (designed to disburse funds quickly in accordance with agreed changes to fundamental macro-economic and sector policies) and conditional cash transfer operations (designed to disburse against mutually agreed social objectives). The economic justification was only one of several parameters considered by MDBs in approving loans; other considerations included geographic balance and country size. A constant feature in the operations financed by all of the MDBs, however, was the need to demonstrate that the resources put at their disposal were being used effectively (i.e., that objectives were being achieved) and efficiently (i.e., that the objectives were achieved at a reasonable cost). The latter has become increasingly important as donor governments are asked by their own constituents to “do more with less,” and consequently expect the MDBs to do the same.

196 A. Paulson

Audit Versus Evaluation Audit From the very beginning, MDB shareholders were concerned with assuring that their loans were used in accordance with strict accounting principles and would not, in any way, compromise the financial strength and reputations of the institutions themselves. For that reason strict procedures were put in place and monitored by MDB staff throughout the implementation process for each regular project (nominally projected to be completed within four years of loan effectiveness). However, most investment projects financed by IDB took longer to complete, about five and a half years. This does not include so-called “fast disbursing” loans such as policy-based loans or emergency loans made in response to natural disasters. These relate to: • • •

•

•

Procurement: international competitive bidding was required for contracts involving foreign exchange; national competitive bidding was required for contracts involving only local currency. Disbursement: bank funds were released only against paid invoices for eligible goods and services within approved contract components. Pari-passu2: for investment projects, MDB policies typically prohibited financing 100% of project costs; borrowers were required to provide the rest in order to assure country ownership. MDBs consequently closely monitored project implementation to assure that borrower, as well as MDB, resources were provided in the agreed proportion – avoiding a situation where MDB resources were fully expended while the borrower’s share, needed to fully complete the project, fell behind. Eventually, however, as lending tools became more flexible and less tied to the achievement of physical infrastructure objectives, these requirements were relaxed. Inspection visits: in addition to overseeing loan disbursements for eligible expenses, MDB technical specialists in charge of supervising project implementation were required to inspect the physical progress of project components on a regular basis and submit semi-annual progress reports. At the IDB, disbursements and inspections were assigned to specialists permanently stationed in the IDB Country Office located in each borrowing member country. Financial audits: internal audit offices in the MDBs review procurement awards, disbursements, financial, and inspection reports, to spot-check compliance with internal bank procedures. On a selective basis, these units also undertake more detailed “inspection visits” which may include a deeper look into bank and borrower records, including financial statements, as well as interviews with staff in the field.

Evaluation For similar reasons, evaluation of the expected socio-economic effects of projects also became important in MDBs. Since the demand for foreign exchange

Opportunities and Challenges 197 for financing development projects in borrower countries vastly exceeded the supply available through MDBs, some acceptable method had to be found for rationing those resources. But how could one objectively compare the relative value of a project in country A, with a similar (or even totally different) project in country B? One thing that the charters of all the MDBs have in common is that loans are to be made for the socio-economic benefit of their member countries. One answer, therefore, was to estimate the economic benefits expected to accrue from each proposed project and demonstrate that they achieved a minimally acceptable economic rate of return. At the World Bank, that rate was set at 10%, reflecting the perceived average marginal return on investment in its borrowing member countries; at the IDB, which was set up to deal with relatively more affluent countries in Latin America, the minimum rate of return required of its projects was set at 12%. In the early days of the MDBs, when most projects involved infrastructure investments of one type or another, cost-benefit analysis offered a way to establish the expected economic rate of return. As lending operations grew to incorporate projects in the social sectors (e.g., health or education), where benefits are more difficult to quantify, least-cost analysis (at the established discount rates) began to be used to rank project importance. Eventually other types of economic analysis (willingness-to-pay, hedonic pricing, etc.) began to be applied. More recently, as projects have become more complex, involving different actors, across multiple sectors, in different geographic areas, and over different time periods, the focus has been more on achieving project outcomes in terms of broader social objectives, such as improved learning, reductions in emissions, greater formal employment for women, etc., rather than on rates of return. In all cases, however, the aim of the MDBs has been to demonstrate that the loans they provide are being used to achieve worthwhile socio-economic objectives in their borrowing member countries. Ex-Ante, Ongoing, and Ex-Post Evaluation Ex-Ante Evaluation (Up to Board Approval) As explained in the previous section, ex-ante evaluation of the expected economic effects of projects proposed for financing has always been a basic precept of MDBs, not only to help define priorities in the allocation of resources, but also to establish their own credibility as development institutions. Each project proposed for financing therefore had to include a rigorous economic analysis, based on a methodology appropriate for the type of project under consideration. That analysis, together with that of other elements (technical design, financial and institutional analysis, etc.) crucial for the success of the project, is then subjected to a formal internal review and approval process at several levels, before the project can be approved for financing. At the IDB, at least two reviews of each project are conducted at the internal operational level, and two additional reviews at the level of management, before it is sent to the board of directors for ultimate approval.

198 A. Paulson Ongoing Evaluation (Aka Project Monitoring During Execution) Investment projects approved for financing by MDBs typically take at least five years to be completed, from the time of loan approval. A review of Project Completion Reports prepared by the IDB in the decade from 1981–1991 found that 80% of projects took longer to complete than the standard four-year execution plan specified in the loan contract, and in one-third of those cases projects took more than six years to complete (OEO, 1992 p. 15). In that lapse of time, many things can influence the basic assumptions on which the original design of the project was based: background economic conditions may be different; government priorities may change; unexpected technical issues may arise; adverse weather or force majeure may cause implementation delays or changes in project design; etc. In such cases, it is imperative that timely actions be taken to make necessary adjustments in order to keep projects on track to meet their objectives. MDBs have consequently established monitoring systems to alert project teams when problems arise in the execution of specific projects (e.g., additional soils testing required in a road construction project, change of location of some schools in a rural education project). These monitoring systems can also be used to assess performance issues across economic sectors or geographic areas, highlighting general problems that require additional management attention (e.g., inadequate government budgetary resources to finance local counterpart expenses which delay several projects in the same country). And, at the global level, monitoring results can be used as one measure of overall MDB performance (e.g., X% of projects under execution are expected to achieve their original objectives). The IDB’s project monitoring system currently reports on nearly 600 projects in execution. In 2017, satisfactory execution progress was validated for about 75% of those projects, while 12% were put on “alert” for additional supervisory attention and the remainder are considered “problem” projects that required reformulation. Reformulation is the term attached to projects whose outcomes, rather than only project components (outputs) need to be changed, requiring changes in the loan contract. See www.iadb.org/en/projects. Ex-Post Evaluation (After Project Completion And/Or Final Disbursement of the Loan) After final disbursement, the MDB specialist in charge of supervising project execution typically prepares a Project Completion Report (PCR) describing the original project objectives, any execution problems encountered, changes made to the project during execution, lessons learned, and a final assessment of project achievements and costs3. PCRs are circulated internally and serve to inform MDB staff of pitfalls to avoid, and good practices to follow, in working on similar new operations. They are also used as one of the primary sources of information about the project for subsequent evaluations performed by the

Opportunities and Challenges 199 MDB’s internal evaluation offices in reports to management, as well as independent evaluations prepared for the board of directors. Accountability Versus Lesson Learning In principle, ex-post evaluation has two objectives: accountability (i.e., a retrospective analysis of what went right or wrong in the completed operation) and lesson learning (i.e., a forward-looking statement of things that should be replicated or avoided in future similar operations). While both objectives can be achieved in the same evaluation exercise, in practice there is often a conflict between these two objectives. This tension stems from a number of factors. For example, as evaluation findings and recommendations are typically disclosed to the public, there may be a tendency for evaluation stakeholders to be less than candid about things that went wrong, than about things that went right. On the other hand, evaluators (as well as auditors) have sometimes been criticized for selecting projects for evaluation that are known to have had problems, since it is often easier to identify the negative impact of factors that impeded project success, than it is to demonstrate how things that did not go wrong had a positive impact (an opposing view is taken by Bohni, Turksema, & Van der Knaap, 2015). Also, in certain circumstances, lesson learning may only be possible after a full impact evaluation has been undertaken, to understand why a project worked or did not – and there could be significant information and budget constraints limiting an agency’s ability to conduct this type of evaluation. Even where lesson learning by evaluation offices seemed relatively straightforward, however, internalizing and incorporating those lessons into the design of new operations has proved difficult. One result is that the same lessons are learned repeatedly, but not successfully adopted by operational staff in charge of project preparation and design. A number of reasons have been given for this reluctance: evaluators were accused of having insufficient technical grounding in the specific fields involved; projects evaluated had been approved years before and it was said that “we don’t do things that way anymore”; lessons learned were either too specific to the project evaluated to be replicated in other projects, or too general to be useful. More frustratingly, it was often indicated that it was “too late” to change the design of a new operation nearing approval for financing.

Audit, Evaluation, Lesson Learning, and Results Reporting at the IDB Audit at the IDB At the project level, technical specialists in the IDB’s Country Offices review the expenditures of project executing agencies to confirm that procurement has been done in accordance with IDB policy and contractual requirements and, if confirmed, verify that loan resources may be used to cover the corresponding

200 A. Paulson expenses. Fiduciary specialists from the IDB’s Financial Management and Procurement Services Division also spot-check procurement, accounting, and auditing practices, particularly where issues seem to be affecting the overall progress of the project or even the operations of the executing agency as a whole (if more than one project is being executed by the same agency). Annually, IDBapproved external auditing firms review project documentation to ensure that procurement and financial management procedures were followed correctly. The Office of the Executive Auditor (AUG), the internal audit function of the bank, reporting to both the IDB President and the Board of Executive Directors, on the other hand, focuses on how the bank operates at a corporate level. AUG brings a systematic approach to evaluating and improving the effectiveness of its bank’s risk management, control, and governance processes. This includes providing independent and objective appraisals and audits of financial, accounting, operational, administrative, and other activities, with a view to improving the efficiency and economy of operations and the use of resources (see Inter-American Development Bank (IDB), 2018). Each year, the bank’s financial statements are audited by an external firm and included in the IDB’s Annual Report. Evaluation at the IDB Uniquely among the MDBs, the IDB for many years had two evaluation offices: the Office of External Review (ORE) established in 1968, and the Operations Evaluation Office (OEO) established in 1974. OEO was in charge of all ex-post evaluations of loan operations and reported primarily to bank management, while ORE reviewed policies, procedures and organizational aspects of the IDB, and reported jointly to management and the board of directors. A 1992 staff survey conducted in preparation of the bank’s reorganization, however, found some confusion about the roles of the two evaluation offices, which sometimes made inconsistent recommendations, failed to provide timely feedback to improve projects under preparation (see below) and in the view of some, overemphasized the economic aspects of evaluations (Tussie, 1995, p. 91). And, it was noted that the IDB lacked a truly Independent Evaluation Office – i.e., one that reported exclusively to the board of directors. Under the reorganization of the IDB, it was therefore decided to fold together the responsibilities of OEO and ORE under an independent body: the Office of Evaluation and Oversight (OVE). OVE currently undertakes four types of evaluations: Country Program Evaluations; Sector Evaluations; Project Evaluations; and Corporate Evaluations. It disseminates findings of these evaluations so that the recommendations can be used in the design, analysis and execution of new operations. Separation of the Audit and Evaluation Functions at the IDB Despite having some similar characteristics (need for independence, empiricalbased evidence, use of accepted methodologies, ethical standards), the audit

Opportunities and Challenges 201 function at the IDB has always been viewed as focusing on internal efficiency (“doing things right”), while the evaluation function has been seen as focusing on effectiveness (“doing the right things”). They are viewed as two complementary functions, employing different methodologies and requiring different professional qualifications. While the evaluation function has evolved over the years (in keeping with changes in borrower demands for greater flexibility and responsiveness, the availability of new lending instruments, corporate reporting needs, etc.) the prospective synergies of merging the evaluation and audit functions have been less obvious and to date no such action has been undertaken. Lesson Learning at the IDB After the 2007 Realignment (reorganization) of the IDB, responsibility for expost evaluation of operations became the responsibility of OVE, but judging from recent OVE evaluations the suggestions for improvement have not significantly changed – implying that lesson learning is still not taking place effectively. OVE’s evaluation report on “Assessing Firm-Support Programs in Brazil” (OVE, 2017c), for example, concludes, among other things, that targeted industrialization policies can lead to rent-seeking by private interests and that credit subsidies can lead to inefficient allocation of funding – similar to the findings that OEO published in its 1984 “Summary of the Evaluations of Global Industrial Credit Operations.” Likewise, OVE’s March 2017 report on IDB support of low-income housing programs (OVE, 2017b) concluded, inter alia, that any public subsidization of housing should include transparent targeting and beneficiary selection processes, and should incorporate consideration of public transport needs. These suggestions, though surely appropriate, echo findings of OEO in its 1988 evaluations of urban development projects. Since 2012, the IDB has produced a series of “Sector Framework Documents” which explicitly incorporate a section on lessons learned from the bank’s experience in projects in that sector, and contain detailed guidance for project teams working on new operations (IDB, 2018). These documents are prepared by management, approved by the board of directors, and are publicly available. Project teams are asked to explain how relevant recommendations have been addressed in the project preparation process. A division within the Bank’s Office of Strategic Planning and Development Effectiveness now performs a similar role to that formerly performed by OEO, focusing specifically on project logic and evaluability in terms of outcome results. Results Reporting at the IDB OVE reports are summarized thematically for the board and are made publicly available in an annual report (see OVE, 2017c). Management, for its part, consolidates information from its project monitoring system and reports on predicted results, based on current expectations of realizing targeted outputs and intermediate outcomes. These consolidated reports are

202 A. Paulson a vailable to the public through the bank’s Corporate Results Framework (IDBG, https://crf.iadb.org) which rates the development effectiveness of its operations, as well as that of the bank as a whole.

Personal Reflections on the Work of the Operations Evaluation Office (OEO) OEO Reports During its 20-year existence, OEO has produced two types of evaluations: (i) Project Performance Audit Reports (PPARs), which basically validated the findings of Project Completion Reports (PCRs) prepared by the sector specialist in charge of supervising project execution, and (ii) Operations Evaluation Reports (OERs), which included a reassessment of the economic justification of the project and involved a visit to project officials in the field. Projects selected for evaluation were chosen based on several criteria: representativeness in terms of projects financed by the IDB in that sector (e.g., road improvement, potable water, health services); country coverage (i.e., geographic area, country size); completion date (preferably within 1–3 years of final disbursement); and those whose PCRs indicated that the projects might have useful lessons for future projects (i.e., projects similar to those in the IDBs future lending pipeline). Given budgetary and staff constraints, OEO normally undertook at least one PPAR and four OERs in each sector, and aimed at covering two sectors per year. Individual PPARs and OERs were sent to management for review and comment, but the final responsibility for their findings rested with OEO. Once PPARs and OERs in a given sector were completed, a summary document was prepared and sent to upper management and the board for review and discussion. Sector summaries did not include the names of the projects or the countries involved, helping to reduce political pressures and allowing the documents to be released to the general public. Most important from a lesson-learning perspective, these evaluation summaries included specific recommendations regarding technical, institutional, and economic issues that the bank should be aware of in future similar operations. In addition OEO published an Annual Report on Operations Evaluation which highlighted the recommendations cited in sector summaries completed that year, but also included an overview of all the PCRs prepared by the bank in that year, and any borrower ex-post evaluation materials received (see the following section). These Annual Reports were also sent to upper management and the board for review, and were made available to the public. Borrower Ex-Post Evaluations (BEPs) Another unique feature of IDB loans was a contractual requirement that borrowers regularly collect and submit data for the eventual ex-post evaluation of the

Opportunities and Challenges 203 project. Depending on the institutional capacity of the borrower, a formal ex-post evaluation report might be required (typically 3–5 years following project completion), or simply data to support an ex-post evaluation of the operation if the IDB decided to do one, or even just a description of the data collection mechanism to be used. OEO reviewed all of the BEP submissions received, and offered specific suggestions for improving their content. But the office was not in a position to offer individualized training to the project executing agencies in charge of complying with this requirement, nor to support them financially. Complying with the BEP requirement essentially became an unfunded mandate: the IDB considered it to be a borrower responsibility, while the borrowers considered it something that had to be done “for the IDB.” There was effectively no borrower commitment to evaluation of the operation, even though they signed loan documents requiring them to do them. Typically, attempts to comply with the BEP requirement was done as an afterthought – when the resources of the loan had been fully disbursed and the borrower and the bank were beginning to think of a follow-on operation. By that time, however, the executing agency in charge of the project had normally been disbanded, and there were neither qualified staff nor resources to do the evaluations. In many, if not most, cases, a further complication was that the annual data required to do the analysis, had never been collected. It therefore proved impractical for the IDB to enforce the BEP requirements. In an effort to streamline the lending process, the board of directors agreed to suspend the BEP requirement for new operations, and made existing BEP requirements optional rather than mandatory. Few borrower-led ex-post evaluations of IDB-financed operations were subsequently done, although a number of countries, including Mexico and Chile (Guzman, Irarrazaval, & de los Rios, 2014), have made substantial progress in building up their own monitoring and evaluation capacity. OEO Feedback to Improve Future Operations Organizationally, OEO was part of the Office of the Controller, an independent part of upper management, not tied to the operational departments (i.e., those responsible for project design or implementation). This enabled OEO to maintain its distance from the operational departments and work as a semi-autonomous unit within management. That same independence, however, impeded the feedback of evaluation experience into the design of new operations. By the time recommendations from sector summaries were available to the operational departments, it was, as previously indicated, often too late to change the design of new operations under development. It was therefore decided that OEO should review all new operations coming before the Loan Committee for approval. The Loan Committee is the highest level of management approval before operations are sent to the board of directors. This involved sending written comments to the project teams assigned to

204 A. Paulson prepare the projects, briefing the Controller on any major points of disagreement with the project team, and on occasion, filling in for the Controller at Loan Committee meetings. In fact the Loan Committee discussed proposed operations on two separate occasions: at the project concept stage (when a brief outline of the project was presented in order to authorize further development of the proposal); and later at the project report stage (when the project was approved for presentation to the board). Even this opportunity to supply feedback to improve the design of new operations was found to be insufficient, however, and OEO was subsequently requested to additionally send comments on new project proposals to internal operational meetings that took place before documents were sent to the Loan Committee. In essence then, OEO was called upon to send written comments regarding each proposed new operation on four separate occasions. In addition to project documents, OEO was also asked to review country programming papers, policy documents, and socio-economic reports. By 1991 the office was reviewing more than 350 documents for proposed loans, technical cooperation, and small project operations annually, as well as approximately 150 documents presented for approval to the bank’s Programming Committee. All in all, this burden proved to be unsustainable for an office of its size. OEO’s Resources OEO had a complement of ten professional-level staff, supported by three office assistants and 2–3 research assistants. Given those limitations, the office did not attempt to maintain a cadre of technical specialists in each sector; instead, most professional staff were economists who had experience in particular fields (e.g., energy, environment, telecommunications, education), complemented by a few non-economists with experience in other dimensions of social development (e.g., sociology, anthropology, institutional analysis). When specialized technical expertise was required (e.g., to launch a series of evaluations in sectors in which existing staff had no expertise – such as fisheries or rural health) outside consultants were hired. Evaluations of individual projects conducted by the office typically took six months to complete, including file research, interviews with headquarters staff familiar with the project, field visits, economic analysis, and report writing. Preparation of the respective sector summary, which included information not only from the evaluations conducted by OEO itself but also relevant information from other sources (including other institutions) typically required an additional six months. In addition to staff time, the cost of the individual evaluations, including consultant fees, travel, and in-country research, typically came to about $50,000 each (in 1990 prices). A full sector evaluation (five PPARs and OERs plus the sector summary) consequently consumed approximately three staff years of time and $300,000 in travel and other expenses. The resources at OEO’s disposal were sufficient to undertake the evaluations in its annual work program, or to provide feedback on the growing number of

Opportunities and Challenges 205 documents it was requested to review, but not both. This undoubtedly contributed to the decision to merge the two evaluation offices in 1993. Currently, OVE conducts evaluations and presents its findings and recommendations to the board of directors. For those recommendations endorsed by the board, management develops corresponding action plans which are then validated by OVE, and tracked with an online system. Each year, OVE reports on progress made by management in implementing its recommendations.

Final Thoughts MDBs continue to be an important source of finance for developing countries. By 2015, loan disbursements by MDBs totalled nearly $67 billion and covered virtually the whole developing world. By their nature, MDBs require public confidence in their ability to deliver resources efficiently and effectively, which makes them particularly interested in assuring that their audit and evaluation mechanisms function adequately. Evaluation is sometimes thought of only in terms of ex-post evaluation, but ex-ante evaluation (project analysis) and ongoing evaluation (project monitoring during execution) are equally important since ex-post evaluation is generally only possible if building blocks in terms of data collection have already been established in the earlier stages. Logic models, which clearly spell out the steps needed to move from inputs to outputs to outcomes, and the basic assumptions underlying that transition, have helped to define what evaluators need to focus on. And, although there is frequently some tension between the twin objectives of evaluation (accountability and lesson learning), evaluation is indispensable for avoiding design errors in future programs – and in that sense, is relatively inexpensive. Although MDBs themselves have generally adopted an “evaluation culture,” much more could be done to help borrowers develop their own evaluation capacity. Many borrowers lack the technical skills and the financial resources to do evaluations, particularly after the loan provided by the MDB has been fully disbursed. But even when technical assistance is offered, borrowers frequently see evaluation as something to be done “for the MDB,” rather than for themselves, and are unwilling to make the commitment. This should be seen as a missed opportunity, since many if not most borrowing countries undertake investment programs far in excess of what is financed by the MDBs. For example, Brazil’s National Development Bank (BNDES) alone disbursed nearly US$22 billion in 2017. Building an evaluation culture in MDB borrowing countries will not be easy. Experience has shown that even when an outside agency steps in to build up an evaluation system (including the necessary institutional framework, training, and financial assistance), the effort withers when the external agency withdraws. In addition to the physical components mentioned above, building an effective evaluation culture requires “political will” by government authorities, which is not always easy to find given the conflict between short-term political goals and

206 A. Paulson long-term payoffs from evaluation. Still, the successful experiences of countries like Mexico and Chile in creating and supporting respected non-partisan evaluation bodies shows that it can be done.

Notes 1 This chapter is based on more than 30 years of experience in evaluation with Multilateral Development Banks (MDBs), starting with my first job as a research assistant at the World Bank in a new division created by Robert McNamara to “find out whatever happened to World Bank projects.” I subsequently worked in the ex-ante economic analysis of projects in a variety of sectors (energy, education, health) at the InterAmerican Development Bank (IDB), followed by a seven-year stint in ex-post evaluation in the Operations Evaluation Office of that institution. In the final years of my career with the IDB, as head of the Project Monitoring and Portfolio Management division, I focused on developing an improved system of monitoring projects under execution and reporting on the development effectiveness of IDB-financed operations. 2 At the IDB, the bank-financed proportion of a project was originally defined by a meticulous calculation of all of the foreign exchange costs of the operation while borrowers were expected to pick up the local currency costs. This proved impractical in practice, however, and standard percentages were eventually set for the relative share of total project costs that the bank would finance in each country, ranging from 60% in the case of richer countries such as Argentina or Brazil, to 90% in the case of poorer countries such as Honduras or Haiti. 3 PCRs are generally required to be completed within six months of final disbursement, but in some cases (e.g., when borrowers have been slow in providing counterpart resources to pay for certain components financed with local currency), this means that the PCR is prepared before the project is fully completed. In such cases the MDBs still follow the project and try to assure that all components covered by the loan agreement are completed, but in practice MDBs have little leverage or influence once the loan is fully disbursed.

References BNDES (2018), Annual Integrated Report 2017. Retrieved from www.bndes.gov.br. Bohni, S., Turksema, R., & Van der Knaap, P. (Eds.) (2015). Success in Evaluation: Focusing on the Positives, Volume 22. In the series Comparative Policy Evaluation. London: Routledge. Guzman, M., Irarrazaval, I., & de los Rios, B. (2014). Monitoring and Evaluation System: the Case of Chile, 1990–2014. IBRD Enhancing Capacity Development Working Paper No. 29. Retrieved from http://documents.worldbank.org/curated/ en/391891468024288871/pdf/936210NWP0Box30LIC00ecd0wp0290Chile.pdf. Inter-American Development Bank (IDB) (2018). About Us. Retrieved from www.iadb. org/en/about-us. Inter-American Development Bank (2018). Development Effectiveness Overview 2018. Retrieved from: https://publications.iadb.org/en. Inter-American Development Bank, Office of Evaluation and Oversight (OVE) (2017a). Annual Report 2016. Retrieved from https://publications.iadb.org/handle/11319/8211. Inter-American Development Bank, Office of Evaluation and Oversight (OVE) (2017b). Comparative Project Evaluation of IDB Support to Low Income Housing Programs in Four Caribbean Countries. Retrieved from https://publications.iadb.org.

Opportunities and Challenges 207 Inter-American Development Bank, Office of Evaluation and Oversight (OVE) (2017c). Project Evaluation: Assessing Firm-Support Programming in Brazil. Document # RE-489–1. Inter-American Development Bank, Operations Evaluation Office (OEO) (1992). Annual Report on Operations Evaluation During 1991. IDB document GN-1769. Tussie, D. (1995) The Inter-American Development Bank. Multilateral Development Banks, Volume. 4. Boulder, CO: Lynne Rienner Publishers.

15 Conclusions Jeremy Lonsdale and Maria Barrados

One of the dominant themes in academic research and public-sector practice in recent years has been the attention given to cross-disciplinary thinking and the desire to break down what are often described as “silos.” There has been a recognition that the solutions to some of the greatest problems are unlikely to come from one discipline, and that individual disciplines or lines of inquiry can be significantly enhanced and far more effective by sharing of insights from elsewhere. A sign of self-confidence has been not to expect that one profession or practice has all the answers, but in fact to assume the opposite and to reach out to others. The evidence contained in this book is that audit and evaluation have not been immune to such thinking. Many of our authors have experience of such activities. In the case of this chapter, one of the authors, a performance auditor at the UK National Audit Office (NAO), has undertaken parallel work with internal audit, seconded an internal auditor into his performance audit team to enhance the skills available and support knowledge exchange, shared findings and undertaken field visits with internal auditors, and contracted out large pieces of audit fieldwork to external evaluators. In several cases, entire performance audits have been undertaken by these external evaluators, with a performance auditor from within his team remaining as liaison to guide the project to produce an output which was suitable for the parliamentary context. Another of this chapter’s authors at the Office of the Auditor General of Canada (OAG), a performance auditor but trained in social science research, built multidisciplinary teams that included medical doctors, lawyers, business administration specialists and accountants as part of the office staff. Work was also done to rely on the work of internal audit in government departments. In this study we chose to examine the relationships between these three activities – performance audit, internal audit, and evaluation. While they are not the only review and oversight functions, based on our experience, they are among the most commonly used and among those with considerable potential for sharing and collaboration. Part I of this volume set out the differences and similarities, looking to clarify definitions and explore the pressures on the different practices. Part II illustrated challenges in the current environment facing one or more practices which raise important considerations for the others. And Part III

Conclusions 209 showed how different organizations have institutionalized greater collaboration and demonstrated how crossover between the practices can occur. Drawing on the work of the contributing chapter authors, we have identified a number of themes which may act as a stimulus for those considering similar approaches. Themes 1–4 suggest factors that, on the basis of the evidence we have drawn together, we consider will influence the degree of collaboration that is possible or likely. These factors are (1) the institutional setting for the work; (2) organizational culture and leadership; (3) the purpose of the oversight activity; and (4) the amount of investment in such work and the flexibility of approaches taken. Themes 5–9 are considerations that commonly need to be managed by auditors or evaluators alike, and which are therefore areas in which there is scope for those from one activity to apply the lessons from the experiences of others. These considerations are (5) timeliness and relevance; (6) professional judgment and rigor; (7), reputation management; (8) usefulness; and finally (9) quality and independence.

Influencing the Scope for Crossover Institutional Setting is Important in Defining the Practice and Hence the Ability and Willingness to Collaborate The importance of context for evaluation has been explored in detail in a range of settings (Pollitt, 2013). Where the organization undertaking the work sits, including whether it is public sector or commercial, the extent of its autonomy, its size, and the scale of budget available (which may influence for example the amount of work undertaken and the scope to look outwards), all have relevance. Such factors may determine whether the entity has been or is deterred from taking an outward perspective. Evaluators in a commercial setting may, for example, be incentivized to draw from ideas elsewhere to develop their reputation, but budget pressures and profit considerations may limit the scope for extensive data gathering or collaborative work which may require extra time and incur additional costs. Being situated in the public sector may also shape the ability and willingness to reach out and share. State audit institutions usually combine performance audit with the external audit of financial statements, which may mean they have routine relationships with the internal auditors of the bodies they audit. There may be expectations that they consult with internal audit, on whose work they may wish to rely on in order to be able to draw assurance if it is relevant to their own. There are thus incentives to co-operate between these distinct entities. On the other hand, external and internal audit may find barriers to collaborating as equals since the former’s role can – and does – involve reviewing the quality of internal audit activity. In order to use this work, the external auditor must have confidence in the structures, practices, and quality of the work in line with the standards outlined in Ruta’s chapter, but it may be harder for those scrutinized by another body to switch to working with it. Similarly, Shipman

210 J. Lonsdale and M. Barrados highlights that examining the quality of government agency evaluations is considered an appropriate task for the Government Accountability Office since evaluation is seen as a key element of prudent agency management on which it needs to form a judgment. Examples within our book show that there have been institutional and organizational barriers to overcome to make a success of such crossover activity. Ensuring there is a common understanding of the purpose and intended outcome for the work can be difficult, where possibly the same terminology may have different meanings among different professions – as Naidoo and Soares have noted – and where tacit knowledge is assumed, but not necessarily present in the collaborating body. Timescales and reporting arrangements (e.g., length and tone of reports, extent to which they are written in technical or lay language, etc.) may be different, as may assumptions as to what constitutes suitable evidence for the purposes of, for example, public accountability, and what constitutes appropriate quality assurance of that evidence. Barriers have also included an absence of a long-term commitment to learning from other disciplines, and the dependence of such co-operation resting on the efforts of a small number of individuals. The rationale behind taking a crossover or collaborative approach can also vary by participant organization as illustrated in Part III of this book. For some it can be seen as a positive step to enhancing the quality and flexibility of their existing work (as seen in Shipman’s chapter), and being outward-looking and receptive to the approaches of others. To others, it can be seen partly as a defensive act given the importance of public bodies justifying themselves through results and when concerns about duplication and the audit burden remain. For example, Frueh highlights how an Internal Oversight Office in UNESCO merged compliance/ assurance functions and a change management function as part of trying to mitigate the risk of not delivering effective results. In other settings – for example, as set out in Naidoo and Soares’s examination of the UNDP – it is clear that the changing context was considered sufficient to render the status quo of working in institutional and professional silos no longer appropriate. Finally, our chapters illustrate that crossover activity can take place within a number of different organizational configurations. Thus, we see, for example: • •

one body undertaking more than one type of oversight activity, GAO (external audit and evaluation) and applying one set of quality criteria for all its work; merged internal audit and evaluation functions in an Internal Oversight Service (UNESCO) with several different types of complementary collaborative models.

The predominant institutional forms for performance audit are national audit offices which are by and large dedicated to the external audit of governments and agencies. They are usually organizations with a clear focus on specified tasks, have legislated mandates, and are accountable to elected officials. The offices, their heads, and work are usually very publicly visible and part of the

Conclusions 211 formally defined public oversight of government. Much of what they do – as Lonsdale argues – is influenced by their external environment, which determines what kind of an institution they are. Nevertheless, the chapter by Shipman – regarding the GAO – illustrates that state audit bodies can shape what they do, how they do it and how it is received. External evaluation that has been incorporated into the work of a national audit office either through dedicated units or through individual professionals shares the same visibility and contribution to oversight as performance audit. Shipman highlights that by being commissioned by Congress and undertaken by the GAO to provide accountability for the use of federal funds, an evaluation can be considered threatening by program managers in a way that the evaluations such as those described by van Stolk and Ling are designed not to be. The work of internal evaluation is much less publicly visible and less part of the public oversight of government, although evaluation teams may build up strong reputations for excellence in their work and for helping to shape policy developments. Some external evaluation units or groups of academic evaluators may also establish a public profile, playing a significant role in critiquing policy interventions and helping to point the way forward for public agencies. But what makes them attractive in terms of methodological rigor and innovation may at the same time make their approaches potentially less valuable in more traditional settings. Thus, van Stolk and Ling’s chapter on embedded evaluation focuses on an approach that has limited applicability to the work of state audit bodies, in part because the institutional arrangements limit the opportunities to get so close to those subject to the oversight attention. Organizational Culture and Leadership Shapes Whether and, if so, How the Practices Share and Collaborate If organizational culture is thought of as the values and norms, codified practice (rules and procedures) and behavior, all three aspects are important in greater sharing between different professional practices. Such collaborative or crossover activities are a matter of choice, rather than mandate, and also come with costs. As a result, it can be assumed that the tone from the top as to whether such work is worth pursuing, giving permission, is crucial. Since such work will involve prioritization of resources, there must be a clear sense of what are the benefits and what are the opportunity costs of doing it, or indeed not doing it if the consequence is that the organization is seen as insular. Chapter 12 highlights lessons from an actual case of co-located internal audit and evaluation functions in UNESCO, where Frueh emphasizes the importance of strong leadership – from either field – including an experienced director and a supportive governing board. Similarly, Naidoo and Soares’s account of the Joint Assessment within UNDP describes how resistance to the initiative was overcome by the strong leadership and commitment of the directors, and how attention to building a team consensus across the disciplines was needed.

212 J. Lonsdale and M. Barrados The openness to outside ideas and collaboration can be seen as a sign of c onfident leadership and a willingness to innovate. In the UK, for example, the National Audit Office went through a period in the 2000s where it subcontracted with academic evaluators to undertake performance audits, in part as a way of benchmarking standards and of seeking new approaches to assessing government performance beyond those traditionally employed by auditors. The initiative was driven from senior levels of the organization. Lower down, individuals responded differently. Some performance auditors were enthusiastic at the prospect of exploring other ideas and enjoyed the practical challenge of working with external partners with different skills and perspectives. Others saw it as a threat to their own assumptions of professional competence and relatively few wanted to be involved in such collaborative work with outside academic evaluators since it was considered to be a risk to their careers. At the Office of the Auditor General in Canada, on the other hand, evaluators were brought into the organization as consultants and permanent staff that were trained in performance audit. Many stayed in the office and had successful careers. In their discussion of the ethical component of organizational culture, Birch, Jacob, and Miller-Pelletier also highlight the importance of ethical leadership and a strong culture. The codification of norms and values in codes of conduct and rules and procedures formally sets out what is expected, which can be expected to vary by professional groups and the organizational setting. Ethics codes are a necessary but not a sufficient condition for ethical conduct. Also required is individual ethical/moral conduct and leadership, particularly in challenging circumstances. Boyle and Wilkins highlight one such example where the head of a national audit body took the unprecedented step in 2018 of publicly challenging a government minister over what he considered was the deliberate misrepresentations of report conclusions. This led to a form of public apology in parliament, but more significantly indicated that the head of the audit body would challenge politically motivated distortions of its work. Organizational values can encourage collaboration and crossover where such activity is interpreted as consistent with an ethical approach and one in keeping with expectations of efficient working. Birch, Jacob, and Miller-Pelletier, drawing on Menzel 2015, indicate that core values in auditing and evaluation are considered to include efficiency and effectiveness in how the work is conducted, participation, and openness and transparency. Such values encourage a tendency toward co-operation and minimizing the cost and impact of oversight work (sometimes referred to as the “burden” of audit), bringing a sense of an ethical dimension in that the aim should be to minimize the call on resources. A review of audit and accountability in the UK recommended coordination between auditors and other oversight bodies in the “interests of maximizing the benefit of their work and minimizing duplication of effort” (Sharman, 2001), a theme also highlighted by Frueh as part of the rationale for co-location of internal audit and evaluation functions within UNESCO.

Conclusions 213 The Purpose of an Audit and Evaluation Dictates the Extent of Optimal Crossover/Collaboration Several of our cases highlight how the purpose of the task helps to dictate what form of oversight work is most appropriate. For accountability purposes, where there is an expectation of individuals being questioned publicly on the basis of agreed and documented facts, performance audits have developed in recent decades. At the opposite end of a spectrum, van Stolk and Ling’s chapter on “embedded evaluation” highlights an approach focused on learning, co-production, and adaptation, in which evaluators are involved “in the delivery and monitoring of an intervention.” Here, the evaluator can be part of the project team. In such circumstances, the objectives of the work are very different, and it seems likely that the scope for crossover and collaboration between auditors will be greater where the purpose of the task tends toward the former, rather than the latter. It is also likely that internal audit and performance auditors will find it easier to see the benefits of collaboration where the focus is more on process, governance, and practice, common ground for both, rather than outcomes and effectiveness. On the other hand, it is instructive to examine the type and purpose of the performance audits that UK NAO asked outside academic evaluators to conduct on its behalf when outsourcing them in the 2000s. They tended to be more crossgovernment and thematic (e.g., examining innovation, risk management, diversity, and the introduction of e-government), more akin to research, rather than more accountability focused audits. Traditionally, performance audits and program evaluations differ in their purpose, scope, methods, standards, and relationship to program staff. Performance audits typically assess program activities’ compliance with criteria or standards based in law and regulation, agency or government-wide policy, or contract or grant terms. Implementation evaluations resemble performance audits in assessing the extent to which program activities conform to statutory and regulatory requirements, program design, and professional standards or customer expectations. That they exist on a spectrum with different strengths, weaknesses, and purposes can be exemplified by how evaluation is used in performance audit in the GAO, which selects one or other depending on the questions at hand. Similarly, in the UN system how internal audit and evaluation can work together comes through management determining how best to answer the questions that derive from the purpose of the undertaking. Making Crossover Happen Requires Flexibility, Investment, and Planning The later chapters of this book illustrate the realities of crossover in different settings, underlining that such collaborative approaches can and do take place. They also illustrate that such work requires considerable effort on the part of practitioners, involving flexibility, investment of effort and resources, and planning, including the marshaling of different types of resources including specialists under contract.

214 J. Lonsdale and M. Barrados The merger of internal audit and evaluation functions in UNESCO in 2000, a model used in around half of UN organizations, has been followed by some debate about what is the most effective model. Organizational size and culture, and the degree of investment in oversight appear to be key determinants for setting up a combined oversight function, but a review emphasized that what mattered most was the management and the investment in the evaluation function and this can happen under any of the models. The lesson from the UNESCO example was that the degree of integration should not be standardized, and should instead be determined by context and the issues to be examined. The importance of a flexible mindset is also highlighted in a number of cases in this book, which underlines the benefits of appreciating what other disciplines can bring to the work; for example, evaluators having some knowledge of audit standards. Avoiding inflexible interpretations and traditional assumptions about the respective roles and strengths of different activities is key to a cooperative approach. Naidoo and Soares write of the need to get to “a common understanding of the difference in principles of audit and evaluation” in order to properly design the joint assessment work. This was designed to avoid “the inevitable contestation that arises when two disciplinary professions need to work together.” In the case of UNESCO, Frueh also notes that the “professional selfperception” of evaluators was that their work was more strategic and focused on improving program impact while they considered auditors were mainly compliance-focused and merely addressed what did not work rather than highlighting good performance and results. Auditors in turn have criticized the lack of professional certification and rigor of their evaluation colleagues. Within UNESCO and other UN bodies, as Frueh sets out, there has in fact been some drift in approach, with internal audit increasingly adopting social science methods and adding a performance audit focus to traditional compliance work. The need for a planned approach to collaboration is also highlighted by Frueh given the different pace, expectations, pressure regarding other ongoing work, and reporting products. Joint planning discussions have subsequently become a standard part of the annual work planning process.

Common Considerations Not Being Timely and Relevant – Different Approaches to a Common Problem It is commonplace to talk about the pace of change and the need for organizations to respond and keep abreast of this change. The need to be up-to-date and relevant is an over-riding issue for all kinds of organizations, both within the private sector, where not doing so leads to commercial failure, or in the public sector, where organizations struggling to meet current user needs are unlikely to survive critical outside scrutiny, and where poor performance may result in abolition or merger. Against this background, the external and political environment has become far more challenging for scrutiny bodies which have increasingly

Conclusions 215 been drawn into topical debates, and where there is little or no sympathy for what can be seen as costly monopoly providers whose work is not seen as relevant to contemporary problems or overly critical agencies. Timeliness has long been a matter of concern to evaluators and much evaluation literature reflect a state of discomfort about the extent to which apparently sound evaluation work has failed to have the desired effect, including because it was not delivered in a timely manner (Clarke, 1999, p. 180). The wide range of evaluation approaches is also a sign of the desire to make a timely and relevant contribution, The over-riding sense from Lonsdale’s chapter on the impact of the external environment on the UK’s NAO is that it too considers striving to be timely and relevant is important for its credibility, and this has shaped why it has speeded up its delivery of reports, and influenced the type of products it has prepared using its performance audit powers. He highlights the particular significance of this challenge for bodies working closely with politicians, whose own time horizons have narrowed in recent years, and who believe they must be seen to be responding often in real-time to fast-moving events. Auditor experimentation with different products to try to provide relevant material is a common approach. While still needing to meet many of the requirements of traditional audit work such as standards of evidence and reporting, and maintaining formal professional standards, they have also sought to avoid the more linear, less flexible, approaches that can characterize traditional audit, especially around timescale and reporting. Thus, Shipman explains that national audit offices like the Government Accountability Office in the United States may provide testimony based on their work, and produce advisory reports. In a very different setting, the need to be relevant, timely and adaptable in a complex and dynamic setting, encourages an approach which is flexible and responsive to service users. Van Stolk and Ling argue the evaluation processes needs to be able to understand and adapt to changing goals through embedding the evaluator in a change initiative, and to provide real-time input into a change process. Evaluation’s focus on learning and less prescriptive standards provide a more conducive environment for the kind of innovation advocated by van Stolk and Ling. Internal audit, like internal evaluation, is also better positioned than performance audit to innovate with real-time and collaborative audits. Internal audit works under professional audit standards and guidance which also has provisions for doing consulting work, hence providing the flexibility needed. Reliance on Professional Judgment, Shaped by Common Need for Rigor Chapters 2–4 describes different practices but ultimately all involve assessment of evidence and the use of judgment to draw conclusions from the evidence. Developing an ability to do this robustly and convincingly depends on training, expertise, and experience to shape those judgments. The practices – shaped by their own traditions and histories – have separately defined what constitutes and

216 J. Lonsdale and M. Barrados supports professional judgment. Strength has come where there is a recognition that different professional judgment and expertise is required to address a particular problem, as noted in chapters in Part III. All practices highlight the importance of rigor in the application of professional norms and expectations, in the use of methods, and in the analysis of data. While all practices recognize the importance of transparency around the judgements, expectations of explaining the chosen methodologies, caveats and thought processes have in general become more developed in evaluation. Here – coming from the applied social science research tradition – there is an expectation that approaches are described in detail, with the aim of replicability of analysis. It is interesting that performance auditors – usually acting within the traditions of public accountability processes – have tended not to go down the route of extensive methodology annexes, although some have taken to including governmental responses and a commentary on them in their reports. This is despite the same risk – as Ruta discusses – that others repeating the work would arrive at different conclusions. Rather the assumption is that readers should be expected to rely on the standing of the organization and be reassured by statements about their quality systems (also discussed by Ruta), adherence to standards, the presentation of sufficient evidence in the report, the fact that drafts are shared with those who the subject of the audit, and in some cases the views of their independent expert reviewers. Birch, Jacob, and Miller-Pelletier highlight a significant risk to the credibility of oversight judgments, which is applicable across all forms of scrutiny, namely the failure to manage and guard against ethical dilemmas. In particular, auditors and evaluators are particularly susceptible to conflicts of loyalty linked to pressures from competing interests and political pressures, particularly at a time when many of the certainties about appropriate actions may be breaking down. The authors highlight the importance of overseers being aware that the dilemma is present and appreciating that they have the need to act ethically. Ethically appropriate judgments are thus dependent on reflection and deliberation, clarity about the risks that are present, and loyalty to high professional standards. Such judgments require those making them to be trained to recognize and analyze the issues and act accordingly. Challenge to Reputation – The Importance of Management Competency in Managing Risk and Associated Communication While ethical dilemmas are a threat to the conduct of individual pieces of work and the judgments made during them, broader reputational damage of audit and evaluation entities is a threat to perceptions of the credibility of their work as a whole. Much has been written about the damage that individual incidents can have to the reputation of individual organizations, some which have had catastrophic impacts e.g., the collapse of Enron in the United States, identified in 2001, and the misuse and misdirection of public funds intended for government

Conclusions 217 advertising in Québec which led to the defeat of the Liberal government in Canada in 2006. Unsurprisingly given their remits, national audit offices and internal audit teams have not been immune to the trend toward formal reputation management that has taken place in many countries. Operating in a non-partisan environment, claiming expertise, and providing the basis on which significant decisions on resource allocation and use, accountability, and career progression for those responsible may be made, means they must maintain credibility, especially where they are seen as monopoly providers. Equally, in more commercial settings, where evaluators may be competing for government work, or where internal audit may face the possibility of being outsourced or the risk of losing management confidence, the issue of credibility and managing risk to reputation are crucial. Boyle and Wilkins highlight the challenges of managing reputational risk among external audit bodies, which have clear parallels to other disciplines, especially those operating in contested spaces. The authors consider the factors that need to be considered when analyzing reputational risk are the performance audit strategy, the selection and scoping of topics, quality, effectiveness, communication, and integrity. Topic selection can be important to avoid suggestions that the program of work either avoided or pursued with undue vigor, politically sensitive topics, or accusations that there is a focus on catching agencies out or, alternatively, working too closely and quietly with them. Such accusations may lead to challenges to the mandate to undertake the work and questions about the expertise of the auditors. Boyle and Wilkins also highlight the importance of maintaining effective communications channels with parliamentarians and being clear in explaining their strategy in order to avoid being seen to be either oblivious to, or overly independent of, parliamentary concerns (and therefore potentially irrelevant in their eyes), or unduly responsive in an effort to be relevant (and thus seen as simply an arm of the legislative). An example of this is highlighted by Lonsdale, through the way in which performance audit work has been tailored to provide informed commentary on the UK’s exit from the European Union (Brexit) – to meet parliamentary expectations of relevant and timely analysis to inform scrutiny – while managing the risk of being drawn into deeply controversial political territory. Risks exist either way, and the NAO’s approach of publishing factual briefings, rather than evaluative reports, was designed to manage them. Pointing Toward Improved Ways of Operating is at the Heart of All Three Activities Utility is also an essential element of the role of evaluation and audit activity at a time of constrained resources, and when, as Frueh notes in the United Nations context, there is often been a perception that there is too much oversight reporting. This accords with a longstanding critique of such work, most notably discussed

218 J. Lonsdale and M. Barrados in detail in Michael Power’s book The Audit Society (Power, 1997). Efforts in the last two decades to counter the argument that audit was undermining performance and discouraging attempts to innovate or take risks have led to efforts to accentuate the perceived positive benefits of audit in particular, and undertake the work in ways which minimize the potentially negative effects. The usefulness of oversight work has a number of features. All three activities are designed to generate recommendations for guidance and improvement, as Wilkins has examined. Although he concludes that internal and external audit and evaluation have much to learn from each other here, he also suggests there is little evidence to guide the formulation of conclusions and recommendations even though for many they may actually be the only sections of a performance audit report they read. Measures of the effectiveness of recommendations include whether they are accepted, and implemented, the wider impacts they have, agency perceptions of their value and their contribution to debate within the entity, in parliament and in the media. To this end, Wilkins argues that it is important to be clear about the purpose of conclusions and recommendations and to have rigorous processes for their development. The formulation of recommendations involves a significant element of judgment alongside more technical considerations, and thought should always be given to not making any recommendations at all. While such instrumental usage of the output from oversight work may help organizations to improve their performance and heighten their influence, Birch, Jacob, and Miller-Pelletier also point to a wider sense in which auditors and evaluators may need a broader vision of their role which redefines their contribution to “social betterment.” This underlines the importance of loyalty to protecting and preserving the public’s interest, rather than loyalty to clients. Such considerations may be easier for public-sector bodies to apply, Challenge of Maintaining Quality and Independence Quality remains a major preoccupation for all three practices, being intimately linked to reputation and standing. Arguably, those oversight bodies which report publicly and whose work is used for the purpose of accountability are under greater pressure to meet the highest quality standards, but all providers – including those whose audiences are internal and whose work is for management eyes only – can only claim value if they are seen to meet expectations and satisfy commonly held views of merit. To attain such a position requires robust quality assurance arrangements of different forms. Shipman’s examination of the GAO highlights how traditional fact-checking of each publication’s facts and figures, designed to ensure accuracy, has been added to as the complexity of GAO’s work has increased, so that more of a matrix approach has developed. This includes expert stakeholder reviews of the study design, progress at decision points, analysis and conclusions, as well as the quality of the final output.

Conclusions 219 Additional external review has been sought in other circumstances. The OAG has an outside panel of experts advising on all its major audits, as well as hiring experts on contract as needed during the audit process. A similar adaptation of quality assurance arrangements from beyond the immediate world of audit was the adaptation by the UK NAO of a more collegiate approach to quality assurance involving internal challenge by directors who are not involved in the audit, or external panels of experts, as well as long-running use of external academic specialists to comment on the final product. In part this was influenced by the approach undertaken by external evaluation bodies and academics with whom NAO has had longstanding relationships. Perhaps the most significant challenge for all the activities under examination is that of maintaining – and being seen to maintain – independence. For auditors, the assumption of independence is built into their standards, and for national audit offices the ability to maintain an arms-length position and independence from the organizations that they audit is bolstered by a series of statutory powers which guarantee access to documents and funding arrangements which limit executive power. Internal auditors and evaluators stress the importance of maintaining an independent state of mind but those housed within government departments and agencies or employed by them face greater risks to independence. As Naidoo and Soares note, the Independent Evaluation Office within the UNDP has been strongly critical of the “presumed guarantee that paid consultants are independent.” Efforts have been in many settings to protect or strengthen independence through structural guarantees and reporting lines. Other safeguards have been implemented such as Independent Audit and/or Evaluation Committees in the UN system and in the Canadian Government.

References Clarke, A. (1999). Evaluation Research: An Introduction to Principles, Methods and Practice. London: Sage. Power, M. (1997). The Audit Society: Rituals of Verification. Oxford, UK: Oxford University Press. Sharman, L. (2001). Holding to Account: The Review of Audit and Accountability in Central Government. London: HM Treasury.

Contributors

Maria Barrados has a Ph.D. in sociology and is currently Executive-in-Residence at the Sprott School of Business, Carleton University. Maria started her career in the Canadian Government as an evaluator, then moved to the Office of the Auditor General as a performance auditor eventually becoming an Assistant Auditor General. Her last government position was the head of the Public Service Commission of Canada an organization which included responsibility for internal audit and program evaluation. She is a member of a number of boards and advisory committees and continues to pursue her interests in public service reform, governance, performance measurement, and financial and human resource management. Lisa Birch is Executive Director of the Center for Public Policy Analysis, Associate Professor (Laval University) and Professor (Champlain St-Lawrence College). Her expertise focuses on public policy and management, textual analysis, and the fulfillment of campaign promises, including the methodology of the Polimetre of which she, along with the CAPP research team, is a co-founder. Richard Boyle is Head of Research, Publishing and Corporate Relations at the Institute of Public Administration in Ireland. His major areas of specialization, in which he has published extensively, include performance measurement, monitoring and evaluation systems, the management of whole of government issues, and public service change and reform programs. Susanne Frueh has some 30 years of work experience with seven different UN entities. Between 2009 and 2014 she was the executive secretary of the Joint Inspection Unit, a subsidiary body to the United Nations General Assembly with a system-wide mandate for evaluation, inspection, and investigations. In July 2014, Susanne took up her current position as the director of the Internal Oversight Service of United Nations Educational, Scientific and Cultural Organization (UNESCO) in Paris, responsible for internal audit, evaluation, and investigations functions. Susanne is a member and current chair of the UN Evaluation Group (UNEG) since 1996, and member and former chair of the UN Representatives of Internal Audit (UNRIAS). She holds an M.Sc. in Geography from the University of South Carolina (US).

Contributors 221 Steve Jacob is a full professor in the Department of Political Science at Laval University (Québec City, Canada) and director of the Center for Public Policy Analysis (CAPP) and PerfEval, a research laboratory on public policy performance and evaluation. Trained as a political analyst and historian, Steve Jacob conducts research dealing with the mechanisms of performance management, and evaluation: professionalization, institutionalization, and capacity building in Europe and Canada, ethics in evaluation and participatory approaches. The results of his research have been published in numerous journals. He is also the author and editor of seven books focusing on evaluation and experts’ involvement during the policy process. Steve Jacob has been an invited professor at several universities in Europe, Africa, and North America. Tom Ling is head of evaluation at RAND Europe focused on improving quality and efficiency in services. Before re-joining RAND, Tom was head of Impact Innovation and Evidence at Save the Children and prior to that spent ten years at RAND as director for Evaluation and Performance Audit following four years as senior research fellow at the UK National Audit Office. Tom has published widely on evaluation, accountability, and related topics and is an honorary senior visiting research fellow at the University of Cambridge, a Professor (emeritus) at Anglia Ruskin University, and a Board member of the European Evaluation Society Jeremy Lonsdale is Director of Defence value for money audit at the UK’s National Audit Office, where he is responsible for programs of performance audits which are used as the basis of hearings of the House of Commons Public Accounts Committee. He has held a number of senior posts at the NAO including as Director-General, Value for Money Audit, between 2007 and 2013. Between 2014 and 2016 he was a Senior Research Leader at RAND Europe in Cambridge. Jeremy gained a PhD in Evaluation and Public Management from Brunel University in 2000. His publications include “Performance auditing: Contributing to accountability in Democratic Government” (2011) and “Making Accountability Work: Dilemmas for Evaluation and for Audit” (2007). Maurya West Meiers is a Senior Evaluation Officer in the World Bank’s Independent Evaluation Group’s Methods Advisory team. She has 20 years of experience working as an evaluator on education, urban, fiscal decentralization, and other public-sector programs. She has served as a team leader of an M&E training program in the World Bank Institute and has trained government professionals in Africa, East Asia, South Asia, Europe, and Latin America on M&E and needs assessment topics. She holds a master’s degree in international affairs and a second master’s degree in education, both from the George Washington University. Alex Miller-Pelletier is pursuing a master’s degree in Legal and Constitutional Studies as a Québec’s Research Funds scholar at the University of St Andrews

222 Contributors in Scotland. She graduated in 2017 from Laval University in Québec City with a bachelor’s degree in political science. In 2017–2018, she took part in the Parliamentary Internship Programme of the House of Commons of Canada. Her research interests focus on public administration, constitutionalism, and federalism. Indran Naidoo is Director of the Independent Evaluation Office of the United Nations Development Programme since 2012, and has overseen over 50 global, thematic, and country program evaluations, convened global conferences on national evaluation capacities and served as a Vice-Chair of the United Nations Evaluation Group. His prior career involved leadership, design, and implementation of oversight systems in South Africa at the Department of Land Affairs and the Public Service Commission, as Chief Director and Deputy Director-General. He holds a Ph.D. from the University of Witwatersrand, a master’s degree from West Virginia University, and a bachelor’s degree from the University of KwaZulu Natal. Arne Paulson has more than 30 years of experience in evaluation with Multilateral Development Banks (MDBs), starting with his first job as a research assistant at the World Bank in a new division created by Robert McNamara to “find out whatever happened to World Bank projects.” He subsequently worked in the ex-ante economic analysis of projects in a variety of sectors (energy, education, health) at the Inter-American Development Bank (IDB), followed by a seven-year stint in ex-post evaluation in the Operations Evaluation Office of that institution. In the final years of his career with the IDB, as head of the Project Monitoring and Portfolio Management division, he focused on developing an improved system of monitoring projects under execution and reporting on the development effectiveness of IDB-financed operations. David Rattray, FCPA, FCGA, CIA, served five years with the IIA Internal Auditing Standards Board and three years as a member of the IIA Public Sector Committee, reviewing and updating standards. David was an executive with the Office of the Auditor General of Canada serving as an Assistant Auditor General of Canada (OAG) for 16 years until 2004. In 2004 he formed his own consulting firm specializing in all aspects of internal audit and from 2004–2015 he conducted dozens of IIA Quality Assurance and Improvement Program (QAIP) reviews in both the public and private sectors. From 2008 to 2014, he was a Treasury Board of Canada public member appointed to the audit and evaluation committees of the Departments of National Defence and Citizenship and Immigration Canada and for the past three years with the Office of the Information Commissioner of Canada. Basia G. Ruta, CPA, CA, is Founder and President of OnPoint Ruta Consulting Ltd. and Executive-in-Residence at the Sprott School of Business, Carleton University. She is also past Auditor General for local government in British Columbia, Canada, and served as Assistant Deputy Minister and Chief

Contributors 223 inancial Officer at Environment, Canada, Assistant Comptroller General, F Internal Audit, Government of Canada and Principal, Office of the Auditor General of Canada Stephanie Shipman retired in 2018 as Assistant Director of the Center for Evaluation Methods and Issues, US Government Accountability Office (GAO), a congressional oversight agency. At GAO, she evaluated federal programs serving families and reviewed federal agencies’ program evaluation activities and policies. In 1999, Stephanie founded Federal Evaluators, an informal network of over 1,600 evaluation officials across the government. She has consulted with the US and foreign governments on evaluation policies and practice and serves on the American Evaluation Association’s Evaluation Policy Task Force. In 2008, she received the Association’s Alma and Gunnar Myrdal Award for government evaluation. She received her doctorate in educational psychology from Columbia University. Ana Soares, Senior Evaluation Advisor at the Independent Evaluation Office of the United Nations Development Programme, conducts country, regional, global, corporate, and programmatic evaluations. She has 22 years of experience in international development, 15 of which mostly dedicated to evaluation. Prior to this, she served as Strategic Planning, Monitoring and Evaluation Officer for UNDP in Brazil. Before joining the UN, she worked for the Center for International Development of the State University of New York. She holds a BA in Political Science and an MA in Public Administration, having taught graduate classes in Public Administration, Political Science, International Development and Evaluation. Jos Vaessen is Advisor of Evaluation Methods at the Independent Evaluation Group at the World Bank (since 2016). He has been involved in evaluation research activities since 1998, first as an academic and consultant to bilateral and multilateral development organizations and from 2011 to 2015 as an evaluation manager at UNESCO. Jos has been the author of several internationally peer-reviewed publications, including three books on evaluation. He regularly serves on reference groups of evaluations for different institutions and is a member of the Board of the European Evaluation Society. He holds a Ph.D. from Maastricht University. Christian van Stolk is Vice President at RAND Europe and director of their Home Affairs and Social Policy research group. He has worked extensively on the design and implementation of public policy and has wide experience in evaluation methods, the study of the impact of regulation, the analysis of public administration, the design of performance management systems, and comparative studies. He has numerous publications on these topics. He has also undertaken reports for the OECD, the World Bank, the European Commission, and the UK Government. Van Stolk holds a Ph.D. from the London School of Economics.

224 Contributors Peter Wilkins has extensive public-sector leadership and management experience and undertakes research and consultancy regarding evaluation, performance improvement, collaboration, accountability, and governance. He has served as Western Australia’s Deputy Ombudsman and prior to this had been WA Assistant Auditor General Performance Review. He is also a National Fellow and a WA Fellow of the Institute of Public Administration, Australia. Peter is an Adjunct Professor at The John Curtin Institute of Public Policy (JCIPP) at Curtin University and an Honorary Research Fellow in the Sir Walter Murdoch School of Public Policy and International Affairs at Murdoch University.

Index

Page numbers in bold denote tables, those in italics denote figures. accountability 1–2, 5, 199, 211, 213; expectations of 128; institutions 88; vs. learning 47, 199; process 138 accountant-auditors 55 accredited professional accountants 17 action-oriented process 176 ad-hoc collaboration 178 advisory role 164, 166 AEAC see Audit and Evaluation Advisory Committee (AEAC) agency management 156, 159, 210 Alkin, M. 47 American Evaluation Association 43–4, 79, 152–3, 158 analytical framework 89 annual work plan 39, 165, 178, 214 aspiration-based codes 79 assurance engagements 18–20, 23, 26, 28, 35, 39 assurance services 35, 113 attribute standards 33–4, 36 audit 58, 130, 172, 187; agencies 108; approaches 190; areas of work 60–1; committees 125; definition 3; and evaluation 1–3, 167, 191, 196–7; failure 27; forms of 55; functions 191–2; methodologies 168; offices 118; organizational setting and products 58–9; origin and practice traditions 55; plans 8; potential for collaboration 63–4; processes 136; professionalization 55–7, 56, 57; recommendations see recommendations, audit; Reporting Relationships 59; reports, commonality in 61; results 175; risk 16; shared challenges 62–3; shared practice principles 59–60; standards 136; of

technology projects 136; traditional criticism of 124; types of 3 auditability 24 Audit and Evaluation Advisory Committee (AEAC) 183, 188–9 auditing: environment in United Kingdom 120–1; and evaluation 128–30, 188; and internal audit 128–30; National Audit Office (NAO) 119–23; of relationships 127–8; responses to changing environment 123–8; state audit institutions and change 118–19 Auditor-General of Victoria 91 auditors 60, 73–4; ethical behavior 79; experimentation 215; personal virtues of 71; reviews 167; tasks of 71; in UNESCO 165; unethical behavior by 71 AUG see Office of the Executive Auditor (AUG) austerity 120, 122 Australia 20 Australian National Audit Office 110; anticipation 112; chance events 112–13; clear and concise 111; communication 112; external pressures 112; flow logically from findings 111; follow-up 112; measurable 111; organizational context 112; relevant and usable 111 “bad barrel” problems 76 bank’s financial statements 200 Bank’s Office of Strategic Planning and Development Effectiveness 201 behavioral independence 45, 49–50 BEPs see borrower ex-post evaluations (BEPs)

226 Index Birch, Jacob, and Miller-Pelletier 212 borrower ex-post evaluations (BEPs) 202–3; requirements 203; submissions 203 Boyle, R. 212 Brazil’s National Development Bank (BNDES) 205 Bretton Woods institutions 194 Budget and Accounting Act 153 budgets 45 Busuioc, E.M. 87–8 C&AG see Comptroller & Auditor General (C&AG) Canada 20; Auditor General Act 37; Canadian Armed Forces in 37; Chartered Professional Accountant (CPA) 37; Federal Accountability Act 37; Internal Audit at National Defence 37; for performance audits 20 Canadian Audit Office 102 Canadian Auditor General 103–4 cash transfer operations 195 CCGs see Clinical Commissioning Groups (CCGs) Certified Internal Auditor (CIA) 36, 165 chance events 109 change management function 166 change model, management of 7 Chartered Professional Accountant (CPA) 21, 35, 37–40, 76 Chelimsky, E. 5–6 Christie, C. 47 CIA see Certified Internal Auditor (CIA) classical experimental design 152 Clinical Commissioning Groups (CCGs) 144–5 codes of ethics 39–40, 78–9 coherence 179–80 collaboration 54–5; crossover and 2; degree of 209; potential for 63–4; types of 168, 169 co-location 166–7; functions 162–3; of internal audit 212; opportunities and challenges of 167–8 Committee of Public Accounts 127 communication: between auditors and auditees 107–8; channels with legislatures 96 Comptroller & Auditor General (C&AG) 92–3 conflicts of interest 73 conflicts of loyalty 73 consequential ethics 75 consistency 50, 80

consulting services 35, 38 context-related knowledge 49 context sophistication 155–6 continuous improvement, principles of 19 coping, strategies for 73 Corporate Evaluations 200 corporate process evaluations 50 Corporate Results Framework 201–2 corruption 71, 182 cost-benefit analysis 197 cost-effectiveness of forms 130 Country Program Evaluations 200 CPA see Chartered Professional Accountant (CPA) credibility 186–90; of audit 60; enhanced 190–1; of evaluation 159; importance of 130; of research 157; of study 189 credit subsidies 201 culture sector 172, 173, 174 DAC see Development Assistance Committee (DAC) data collection 10, 50, 129, 133, 141, 158–9; methods 186–8 data mining 179 data quality 60 decision-making process 72, 184 degree of collaboration 209 degree of connectivity 132–3 deontological ethics 75 developmental evaluation 132–3, 135, 139–40 Development Assistance Committee (DAC) 134 dialog-oriented wave 42 disbursement 195–6, 198–9, 202, 205 diversity 42, 45–7, 213 due professional care 36 economic analysis 197, 204 embedded evaluation 132–5, 140, 143, 213; adaptable and promote learning 138–9; case studies 140–6; challenges involved with 140; decision-making cycles of implementation 137–8; defined 133; design of 140; forms of 134–5; independence in 138; links to audit 136; opportunity in using 140; participatory 138; principles of 143; requirements of 137; timeliness of 137–8, 140 emergency loans 196 employees 38–9, 43, 46, 58, 72–3, 78–81, 157

Index 227 environment 78, 81; external 10, 28; importance of 119; institutional enabling 48–9; legal and cultural 33; SAI 21 Estate and Technology Transformation Fund (ETTF) 144 ethical behavior 76, 79, 81, 83–4; predispositions for 80–2 ethical climate 80–1 ethical decision-making 80–1 ethical dilemmas 71–2, 82; confronted by 74; continuum of 75, 75; resolving 74–6; sources of 71–4 ethical leadership 81 ethical reasoning 75, 81 ethics in audits and evaluation 71; climate and organizational culture 80–1; codes 79; decision-making process 76–80; definition of 71; evaluator 77–8; leadership 81; organizational values 77–8; perspectives 74; philosophy of 74; policy guidelines, rewards, and sanctions 81–2 ETTF see Estate and Technology Transformation Fund (ETTF) Eurasian Alliance of National Evaluation Associations 44 European Court of Auditors 54–5 European Union 120–2, 217 evaluated agency, relationship with 159–60 evaluation 5, 18, 41–2, 54–5, 57–8, 60–1, 63, 130, 135, 151, 159, 166, 172, 184, 187, 208; approaches 138, 190; areas of work 60–1; audit/auditing and 2, 167, 188, 191; collaborative approach of 179; commonality in 61; community 43; complexity of 168; context for 209; culture 48–9; demand side of 46; dissemination strategy 177; economic aspects of 200; embeddedness of 45; of existing programs 152; of federal programs 152; forms of 134; functions 45–7, 162, 191–2, 200–1; GAO 153–60; history of 41–4; independence 45, 153; of individual projects 204; institutionalization of 42–3, 45–7; and internal audit 59; knowledge of 49; large-scale 46; models 188; objectivity 153; offices, lesson learning by 199; organizational setting and products 58–9; origin and practice traditions 55; performance audits and program evaluations 151–3; potential for collaboration 63–4; practices 45;

professionalization 55–7, 56–7; projects selected for 202; purpose, scope, and approach 47–8; pushback to 190; quality in 48–51; Reporting Relationships 59; selecting projects for 199; shared challenges 62–3; shared practice principles 59–60; skills 138; standards 72; studies 152; types of 134, 200, 202 evaluation criteria 155–6; selection of 156 evaluative analysis 46–7, 49 evaluative information, producers of 130 evaluative thinking 50, 138 evaluators 57, 60, 72–4, 167, 174–5, 179, 186–8, 199, 209, 219; ethical challenges of 73; expansion and presence of 43; “in-house” 46; relationship 159; selfperception of 163; skilled 63–4; tasks of 71; unethical behavior by 71 evidence-based wave 42 evidence, types of 187 ex-ante assessments of programs 47 ex-ante evaluations 46, 139, 197, 205 ex-post evaluation 46, 134, 136, 139, 198–9, 202–3, 205 external advisory panels 159 external audit 21, 209; financial statement 39; firms 200; offices 62 external auditor 35, 38–9, 161, 209 external environment, openness to 10, 62, 88, 123–4, 129, 211 external evaluation 211, 219 external evaluators 130, 153, 208 external expert reviews 159 “fast disbursing” loans 196 federal funds 159, 211 financial audits 57, 90, 196; responsibilities for 56 financial statement audits 16–17, 31 financing 197; approval for 199 flexibility 75, 136, 139–40, 162, 167, 210, 213–14 focus in evaluation 50 foreign exchange 196–7, 206 formative evaluation 47, 134–5, 139 frauds 71; risk of 36 Freeman, H.E. 41 functional independence 45 funding: inefficient allocation of 201; levels of 62 GAO see Government Accountability Office (GAO) goal-free evaluation 48

228 Index goal-oriented (objectives-based) evaluation 48 Government Accountability Office (GAO) 6, 8, 95–6, 102, 153, 159–60, 210–11, 215, 218; adoption of evaluation 153–4; applied research methods team 154, 158; complexity of 158; conduct of evaluations and audits 155; conducts surveys 157; coordination with 153–4; effectiveness of government programs 154, 165; government audit agency 158; integrated assessments of program 154; management 155; performance audits 154; policy and procedures 154–60; program monitoring and evaluation 156–7; study of a federal program 157 Government Auditing Standards 102, 114, 154–5 Hudson, Joe 6 hybrid strategy 90 IAASB see International Auditing and Assurance Standards Board (IAASB) IBRD see International Bank for Reconstruction and Development (IBRD) IDB see Inter-American Development Bank (IDB) idealism 76 IEAP see International Evaluation Advisory Panel (IEAP) IEO see Independent Evaluation Office (IEO) IFAC see International Federation of Accountants (IFAC) IIA see Institute of Internal Auditors (IIA) IMF see International Monetary Fund (IMF) implementation evaluations 213 implementation (or process) evaluations 152 independence 188–90, 218–19; auditing standards for 158 independent evaluation functions 45 Independent Evaluation Office (IEO) 182–3, 186, 189; direct reporting of 188–9; International Evaluation Panel 189–90; of UNDP 188 in-depth focus groups 187 individual audits 22–4 individual performance audit planning 24–5 individual projects, evaluations of 204 information, primary sources of 198–9 infrastructure projects 195 in-house evaluators 46, 130

inspection visits 196 Institute of Internal Auditors (IIA) 9, 31–2 institutionalization 42–3, 45–7, 51 institutional knowledge 49 insurmountable threats 79 integrated study methodology 176 Inter-American Development Bank (IDB) 194, 196–8, 203; Annual Report 200; audit at 199–201; evaluation functions 200–1; ex-post evaluations of 203; Financial Management and Procurement Services Division 200; lesson learning at 201; organizational aspects of 200; policy and contractual requirements 199–200; project monitoring system 198; Realignment 2007 of 201; reorganization of 200; results reporting at 201–2 interim evaluations 139 internal audit 1–3, 5, 31, 55, 58, 62, 113, 128, 136, 162, 165–6, 175, 208–9; activities 33; assignment 36; dissemination of 180; evaluation and 59, 114–15; external vs. 38–40; flexibilities in 136; internal controls 120, 138; International Standards of Practice for 33–5; organizations 31–2; practices 17–18; practitioners of 56; profession 32–3; and program evaluation 9; reports 39; skills and experience 36–7; standards 32, 34–5, 56; teams 88 internal auditors 36, 72, 136, 167, 219; education requirements for 37; majority of 31; policies 60–1 Internal Oversight Service (IOS) 54, 161–2, 164, 174; audit 172; presentation by 174; work and representation 165 International Auditing and Assurance Standards Board (IAASB) 18–19 International Bank for Reconstruction and Development (IBRD) 194 International Evaluation Advisory Panel (IEAP) 183 international evaluation community 43 International Federation of Accountants (IFAC) 18 international institutions 194 International Monetary Fund (IMF) 194 International Organization for Co-operation in Evaluation (IOCE) 43–4 International Organization of Supreme Audit Institutions (INTOSAI) 6, 18, 35, 54, 60, 79, 102, 106; code of ethics 21–2; conduct of individual audits 22–3;

Index 229 Framework of Professional Standards 19–20; independence of SAIs 22; quality control and ethical requirements 20–2; standards 20, 25; transparency and accountability for SAIs 22 International Professional Practices Framework (IPPF) 32 International Standards for the Professional Practice of Internal Auditing 170 International Standards of Practice for Internal Audit 33–5 International Standards of Supreme Audit Institutions (ISSAI) 19, 126–7 INTOSAI see International Organization of Supreme Audit Institutions (INTOSAI) investigation 166 investment 213–14; projects 196 IOCE see International Organization for Co-operation in Evaluation (IOCE) IOS see Internal Oversight Service (IOS) IPPF see International Professional Practices Framework (IPPF) irregular behavior 71 ISSAI see International Standards of Supreme Audit Institutions (ISSAI) joint audit 190–1 joint exercises 167, 170, 184–6, 189–91 Joint Inspection Unit of the United Nations 54, 168 joint study methodology 170–1; audit 170; challenges 171; evaluation 170; opportunities 171 labor, division of 168 large-scale evaluations 46 leadership 80 learning 43, 45, 47–8, 50–1, 55, 114, 132–3, 139–40, 146–7, 158, 185–8, 191, 199; accountability vs. 46–7 least-cost analysis 197 legitimacy, concepts of 188 lesson learning 199, 205; at IDB 201; perspectives 202 licensed professional accountants 17 Ling, T. 88, 213, 215 Lipsey, M.W. 41 Lodge, M. 87–8 Lonsdale, J. 88, 119, 215 loyalty conflict 73 management systems 187 Maor, M. 87

market orientation 42 Mayne, J. 6, 139 McPhee, I. 87 MDBs see multilateral development banks (MDBs) Middle East and North Africa Evaluation Network 44 monitoring and evaluation (M&E) 43, 45 moral intensity, theory of 77 moral reasoning skills 81 multilateral development banks (MDBs) 195, 205; accountability vs. lesson learning 199; audit vs. evaluation 196–7; evaluation culture in 205; ex-ante evaluation 197; ex-post evaluation 198–9; IDB see Inter-American Development Bank (IDB); infrastructure investments 197; internal evaluation offices 198–9; investment projects 198; loan disbursements by 205; ongoing evaluation 198; Operations Evaluation Office (OEO) see Operations Evaluation Office (OEO); performance 198; regional 195; role of 194–5; shareholders 196; thoughts and suggestions for 194 multiple stakeholders 187 Municipal Association of Victoria (MAV) 91–2 Naidoo, I. 210–11, 214, 219 NAO see National Audit Office (NAO) National Asset Management Agency (NAMA) 92–3 National Audit Act 1983 119 National Audit Office (NAO) 6–8, 17, 39, 60–4, 102, 119–20, 122, 136, 212; budget and performance 121; close relationship 123; comparative and investigative work 125–6; crossgovernment comparison 125; crossgovernment perspectives 124; emphasis on assisting government 124; environmental changes for 123; examination of preparations 127; external environment 123; implications for 122; as leaders in practice 7–9; OECD study on 62; significance for 122; specific mandates of 7; strategies 123, 128; transformation programme 125; Universal Credit report 97; value for money publications 125 national audit standards 35

230 Index National Defence and the Canadian Armed Forces 37 negative incentives 81 neo-liberal wave 42 net impact evaluations 152 “not-for-profit” organizations 31 OAG see Office of the Auditor General of Canada (OAG) OAI see Office of Audit and Investigations (OAI) objectivity, auditing standards for 158 OECD-DAC criteria 48 OEO see Operations Evaluation Office (OEO) Office of Audit and Investigations (OAI) 182–3, 186 Office of Evaluation and Oversight (OVE) 200–2 Office of External Review (ORE) 200 Office of Internal Oversight (OIOS) 162 Office of the Auditor General of Canada (OAG) 208 Office of the Executive Auditor (AUG) 200 OIOS see Office of Internal Oversight (OIOS) operational audits 17 operational concepts 186 Operations Evaluation Office (OEO) 200; Annual Report on Operations Evaluation 202; borrower ex-post evaluations (BEPs) 202–3; feedback to improve future operations 203–4; Loan Committee for approval 203–5; reports 202; resources at 204–5; responsibilities of 200 Operations Evaluation Reports (OERs) 202; responsibilities of 200 optimal crossover/collaboration 213 ORE see Office of External Review (ORE) Organisation of Economic Co-operation and Development (OECD) 134 organizational configurations 210 organizational culture 80–1, 109, 211–12 organizational hypocrisy 81–2 organizational values 212 OVE see Office of Evaluation and Oversight (OVE) oversight, usefulness of 218 “ownership” of recommendations 175 parallel study methodology 172 pari-passu 196 parliamentary audience 124

Patton, Quinn 132 peer-reviewing 56–7, 185 penalties 81 performance: evaluations 180; information 9–10, 175; management 1; measurement 60; standards 33–5 performance audit 1, 3, 5, 15, 18, 21, 55, 57–8, 61, 87, 89, 91, 103, 137, 161, 180, 184, 208, 212–13; assurance performance report 25–6; characteristic of 7, 17–18; conducting examination 25; conduct of 88; critical of 93; definitions 15–18; development for 6; engagement 27; and evaluation 54; examining questions of 1; Government Auditing Standards for 154; independent inspections of 26–7; institutional forms for 210; INTOSAI see International Organization of Supreme Audit Institutions (INTOSAI); lines of inquiry 24; management of audit risk 26–7; objective of 102; organizations 4; planning 24–5; practice of 4, 15; practitioners of 56; process 23–4; and program evaluation 5–9; public awareness of 2; reports 94–5; standard for 18–20, 56, 102; strategies for 91; topic selection 23; types of 22–3 performance auditing 88 performance auditors 5, 7–8, 62, 72, 79, 101, 136, 212; importance of 93; in national audit offices 76 perspectives for solutions 75, 75 policy-based loans 195–6 political affiliation 153 portfolio strategy 90 PPARs see Project Performance Audit Reports (PPARs) practice traditions 55, 56, 187 primary stakeholders 155 Primer on Program Evaluation 54 principles-based approaches 78–80 private interests 201 procurement 196 professional accounting 31; international standards for 18–19 professional associations 43 professional auditors 174–5 professional certifications 165 professional code of conduct 76 professional education 158 professional ethical codes 79 professionalization 1, 55–7, 56–7 professional judgment 215–16

Index 231 professional misconduct scandals 71 professional standards 56–7; frameworks for 55 professional training opportunities 44 proficiency 36 program evaluations 4, 61, 151–2, 213; definition 3–4 program evaluators 4, 72 program management 41–2, 153, 159 program stakeholders 153 program’s value 156 Project Completion Reports (PCRs) 198–9, 202 Project Eagle case 93 Project Evaluations 200 project monitoring system 201–2 Project Performance Audit Reports (PPARs) 202 public accountability 160, 210 Public Accounts Commission 121 Public Accounts Committee 7, 88, 93, 119–20, 127 public administrations 79 public funds 128–9 public investment projects 194–5 public-private relationships 121 public-sector bodies 127 public-sector evaluators 79 public-sector resource constraints 121–2 Public Service Commission of Canada 9 Public Service Employment Act 9 public service reform 122 QAIP see Quality Assurance and Improvement Program (QAIP) Q Lab approach 141–3 quality 218–19; assurance 185, 189, 218–19; control 21, 26–7; of research 157 Quality Assurance and Improvement Program (QAIP) 32–3 RAND Europe 140–1, 143–4 randomized control trial (RCT), design of 140 rapid-cycle evaluation 135 rapid evaluation 135 RBM see Result Based Management (RBM) readability 179–80 real-time evaluation (RTE) 133–5, 137 real-time learning 135 recommendations, audit 101–3; anticipation 108; Australian National Audit Office see Australian National Audit Office; by Australian state and

territory 103; chance events 109; clear and concise 106; communication 107–8; context 109; domains of content, process and context 105; effective 106–7; effectiveness of 103; European study of 103; external pressures 109; factors affecting effectiveness 103–4; flow logically from findings 106; follow-up 108–9; formulation of 102–3; framework for 104–5; implementation of 104; measures 106–7; organizational culture 109; “ownership” of 175; practicality of 26; relevant and usable 107; usefulness of 107 reformulation 198 regional focus 195 relationships, importance of 129 reliability 50 reporting strategy 177 report writing 179–80, 185 reputation 87; challenge to 216–17 reputational risk 87–8; analysis of 88–9; analytic framework 88–9, 89; audit strategy 89–91; communication 95–6; effectiveness 94–5; integrity 97–8; management of 91, 95, 98; quality 92–4; topic selection 91–2 Result Based Management (RBM) 185–6; quality assurance and 188 risk-based audit plan 24 risk management 60–1, 87; effectiveness of 35 Rossi, P.H. 41 rule-based decisions 75 SAIs see Supreme Audit Institutions (SAIs) Schot, J. 141 Scriven, M. 41, 155–6 Sector Evaluations 200 Shahan, A. 95 shared challenges 62–3 shared decision-making 185 shared practice principles 59–60 Shipman, S. 209–11, 218 significant product developments 126 silos 2, 208 single-loop evaluations 135 Soares, A.R. 210–11, 214, 219 social development, dimensions of 204 social science research 152 socio-demographic variables 76 socio-economic development projects 195 Sri Lanka Evaluation Association 44 stabilization interventions 134

232 Index staff experience 37 stakeholder 47; engagement 168, 179; management 138; participation 133–4; public exposure for 33; scrutiny 133 standard management response 175 state audit bodies 211 state audit institutions 209 state program evaluation 157 Steinmueller, E.W. 141 Strategic Commissioning Framework (SCF) 144 strategic performance information 165 strategic risk-based audit plan 24–5 structural independence 45 summative evaluation 47, 134 Supreme Audit Institutions (SAIs) 17, 19, 23, 88, 98, 118–19, 121, 123, 168; audiences of 88; audit offices 26–7; to demonstrate compliance 21; environment 21; independence of 22; performance auditors within 126; reputation of 87, 95–6; risk-based planning 24; risk management of 88; role in monitoring action 26; statutory independence of 87; strategic options of 89–90; strategic plan for 23; strategies 90; transparency and accountability for 22 Sustainable Development Goals 191–2 systematic assessment of operation 155–6 targeted communications strategies 180 technical audit 58 theory of change 138–9 timeliness 215 topicality 127 traditional approach 140 traditional evaluation 132, 137; approaches 139; skills 137–8 traditional financial audit 1 traditional performance audit approach 137 traditional theory of change approaches 139 transformative innovation 141–2 Transforming Primary Care (TPC) program 143–5 transparency, expectations of 128 triangulation 186–8 UN see United Nations (UN) uncertainty 119; in British politics 120 UNDP see United National Development Programme (UNDP) UNESCO see United Nations Education Culture and Communications Organization (UNESCO)

unethical behavior 71, 73 UN Evaluation Group (UNEG) 163; norms and standards 168, 170 UNICEF 43–4 United Kingdom (UK) 118–19, 128; changing environment 120–1; Government Internal Audit Agency 129; National Audit Office (NAO) 208; politics 120 United National Development Programme (UNDP) 182–3, 188; data collection methods 186–8; decision rights, communication channels, and protocols 184–5; evaluability and key questions 185–6; fear factor vs. enhanced credibility ripples 190–1; governing body 182; IEO of 188; independence, credibility, and utility 188–90; institutional effectiveness in 182–3, 185; International Conferences on National Evaluation Capacities 189; joint undertaking 183; learning opportunity for 191; management 182, 189; oversight architecture 182; reputation of 191; Strategic Plan 182, 191; triangulation 186–8 United Nations (UN) 54, 164; co-located functions in 165; Development Programme 162; evaluation community 163; positioning of evaluation offices in 163 United Nations Education Culture and Communications Organization (UNESCO) 161–2, 164, 164; ad-hoc collaboration 178; for benchmarking purposes 172; co-location mean within 165–6; conscious co-location 166–78; context, functions in 164; corporate policy on 177–8; culture conventions 171–4; efficiency and effectiveness of 175; efficiency questions 179; evaluation and audit-combined oversight functions in 167; executive board 165; Financial Regulations 161; functions in 164; information meeting for 174; internal audit and evaluation functions in 214; Internal Oversight Office in 210; joint audit and evaluation of 169–71; joint planning discussions 178; location of oversight functions 162–4; methodological toolbox 178–9; readability and coherence 179–80; response in emergency situations 176; role in

Index 233 education 175; rules and procedures 170; senior management team (SMT) 165; skills sets of director 164–5; stakeholder engagement 179; targeted communications strategies 180; use of reports by recipients 180 United Nations Evaluation Group 189 United Nations Group (UNEG) 189 Universal Credit report 97 UN Representatives of Internal Audit Services (UNRIAS) 163 UNRIAS see UN Representatives of Internal Audit Services (UNRIAS) UN’s Joint Inspection Unit (JIU) 162–3 US: General Accounting Office 42; Government Accountability Office (GAO), 151; Government Auditing Standards 154–5 utility 188–90

validity 50 value for money audits 17 van Stolk, C. 213, 215 virtuous practices 80 Voluntary Organizations for Professional Evaluation (VOPEs) 43–4 VOPEs see Voluntary Organizations for Professional Evaluation (VOPEs) Weiss 155–6 whole-of-process-approach 50–1 Wilkins, P. 88, 212 Wisler, C. 5–6 Wittmer, D.W. 80–1 Working Group on Program Evaluation 6 work, types of 2 World Bank 194–5, 197; resources of 194–5 Yellow Book 102, 155–60