Principles and Practice of Clinical Research [4 ed.] 9780128499054

423 105 29MB

English Pages [823] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Principles and Practice of Clinical Research [4 ed.]
 9780128499054

  • Commentary
  • eBook

Table of contents :
Cover
Principles and Practice of Clinical Research
Copyright
Contents
List of Contributors
Acknowledgments
Preface
1. A Historical Perspective on Clinical Research
The Earliest Clinical Research
Greek and Roman Influence
Middle Ages and Renaissance
Seventeenth Century
Eighteenth Century
Nineteenth Century
Twentieth Century and Beyond
Summary Questions
References
I ETHICAL, REGULATORY AND LEGAL ISSUES
2. Ethical Principles in Clinical Research
Distinguishing Clinical Research From Clinical Practice
Ethics and Clinical Research
History of Ethical Attention to Clinical Research
Benefit to the Individual
Benefit to Society
Protection of Research Subjects
Research as a Benefit
Community Involvement in Research
Codes of Research Ethics and Regulations
Research on Bioethical Questions
Ethical Framework for Clinical Research
Value and Validity
Fair Subject Selection
Favorable Risk/Benefit Ratio
Independent Review
Informed Consent
Respect for Enrolled Subjects
Ethical Considerations in Randomized Controlled Trials
Conclusion
Summary Questions
References
3. Integrity in Research: Principles for the Conduct of Research
Guidelines and Principles for the Conduct of Research
Scientific Integrity and Research Misconduct
Responsibilities of Research Supervisors and Trainees
Data Management, Archiving, and Sharing
Data Management
Archiving
Data Sharing
Research Involving Human and Animal Subjects
Collaborative and Team Science
Conflict of Interest and Commitment
Peer Review
Publication Practices, Responsible Authorship, and Results Reproducibility
Publication Practices
Authorship
Reproducibility
Study Questions
Acknowledgments
References
Further Reading
4. Institutional Review Boards
Historical, Ethical, and Regulatory Foundations of Current Requirements for Research Involving Human Subjects
Historical Foundations
Ethical Foundations
Regulatory Foundations
Institutional Review Boards
Key Concepts and Definitions From the Common Rule
Research
Exempt Research Activities
Minimal Risk and Expedited Review Procedures
Institutional Review Board's Review of Research
Institutional Review Board Membership
Criteria for Institutional Review Board Approval of Research
Continuing Review of Research
Clinical Researchers and Institutional Review Boards
Evaluation and Evolution of the Current System of Research Oversight and Institutional Review Boards
Proposed Changes to Current Oversight of Research With Human Subjects
Critique and Proposed Changes to Institutional Review Board Operations
Conclusion
Summary Questions
References
5. Accreditation of Human Research Protection Programs
A Brief History
Principles of Accreditation
What AAHRPP Expects From Organizations
What Organizations Can Expect From AAHRPP
Human Research Protection Programs: The Shift to Shared Responsibility
The Accreditation Standards
Domain I: Organization
Domain II: Institutional Review Board or Ethics Committee
Domain III: Researcher and Research Staff
Steps to Accreditation
Value of Accreditation
Summary Questions
References
6. The Regulation of Drugs and Biological Products by the Food and Drug Administration
Background
Mission and Terminology
Drug and Biological Product Life Cycle
Discovery/Nonclinical Investigation
Clinical Trials
Responsibilities and Documentation
Sponsors
Investigators
Clinical Protocol
Institutional Review Board
Food and Drug Administration
Investigator Brochure
Investigational New Drug Safety Reports
Marketing Approval/Licensure
Pre-New Drug Application/Biologics License Application Submission
Application
Food and Drug Administration Review
Postapproval
Compliance
Summary
Summary Questions
7. International Regulation of Drugs and Biological Products
Introduction
Background
Early Operations and Achievements of International Conference on Harmonisation
Recent Evolution and Reforms
Membership in the New International Council on Harmonisation
Organization of the New International Council on Harmonisation
Financing the New International Council on Harmonisation
Overview of the International Council on Harmonisation Technical Harmonization Process
Nomination and Selection of Topics for Harmonization
International Council on Harmonisation Five-Step Harmonization Procedure
International Council on Harmonisation Guidelines Most Relevant to Clinical Research
Future Work in Regulatory Harmonization
References
8. Clinical Research in International Settings: Opportunities, Challenges, and Recommendations
Introduction
Challenges
Inadequate Human Resources
Deficient Research Infrastructures
Subpar Health-Care Systems
Information Gaps
Political Instability, Civil Disorders, and Natural Disasters
Economic and Seasonal Migration
Physical Barriers
Study Participant Characteristics
Ethical Issues
Recommendations
Understand the Local Setting
Train, Mentor, and Closely Supervise
Develop and Enhance Local Institutional Review Board Capacity
Develop Office for Sponsored Research/Office of Clinical Research
Prepare Data Safety and Monitoring Plan for Adverse Events
Provide Ancillary Care
Use Technology for Effective Communication
Have Long-Term Plans
Integrate With Existing Infrastructure
Conclusion
Summary Questions
References
9. The Role and Importance of Clinical Trial Registries and Results Databases
Introduction
Background
Definitions
Rationale for Clinical Trial Registration and Results Reporting
History of ClinicalTrials.gov
Current Policies
Policies Affecting Clinical Trials in the United States
International Landscape
Registering Clinical Trials at ClinicalTrials.gov
Data Standards and the Minimal Data Set
Points to Consider
Interventional Versus Observational Studies
What Is a Single Clinical Trial?
Importance of the Protocol
Keeping Information Up-to-Date
Reporting Results to ClinicalTrials.gov
Data Standards and the Minimal Data Set
Points to Consider
Data Preparation
Review Criteria
Relation of Results Reporting to Publication
Key Scientific Principles and Best Practices for Reporting
Issues in Reporting Outcome Measures
Issues Related to Analysis Population
Using ClinicalTrials.gov Data
Intended Audience
Search Tips for ClinicalTrials.gov
Points to Consider When Using ClinicalTrials.gov to Study the Overall Clinical Research Enterprise
Looking Forward
Conclusion
Summary/Discussion Questions
References
10. Data and Safety Monitoring
Why Monitor?
Who Monitors?
Data and Safety Monitoring Board
History of Data and Safety Monitoring Boards
When Is a Data and Safety Monitoring Board Needed?
What to Monitor?
Monitoring Participant Safety
Monitoring Trial Conduct
Participant Flow
Participants' Baseline Characteristics
Randomization Outcome
Regulatory Compliance
Trial Performance
Protocol Compliance by Research Staff
Recruitment
Participants' Treatment Adherence (Treatment Exposure)
Data Completeness (Availability of Primary and Other Key Endpoints)
Attendance at Follow-Up Visits (Retention)
Data Quality
Flags and Triggers
Interim Analyses
Sample Size Recalculation
Sample Size Recalculation Based Only on Nuisance Parameters
Sample Size Recalculation Based on Nuisance Parameters and Observed Treatment Effect
Interim Analyses for Efficacy, Futility, and/or Harm
Sequential Designs (Also Known as Group Sequential Tests or Repeated Significance Tests)
Stochastic Curtailment Tests
When and How Often to Monitor?
Special Topics
General Structure of Data and Safety Monitoring Board Meetings
Masking of the Data and Safety Monitoring Board
Summary
Summary Questions
Acknowledgments
References
11. Unanticipated Risk in Clinical Research∗
The Reasons
The Drug
The Target
The Trials
Cassandra Revealed
Extended Studies
Fialuridine Toxicity
Reassessing the Preclinical Studies
Research Oversight
The Investigations Begin
Scientific Misconduct
The Food and Drug Administration
The National Institutes of Health
The Institute of Medicine
The Media
The Congress
The Law
Epilogue
Drug Development
Is Preclinical Testing of New Drugs a Reliable Predictor of Toxicity?
Are Patients in Drug Trials Monitored Carefully and Objectively Enough?
Clinical Research Training
Personal Perspectives
Acknowledgments
References
Further Reading
12. Legal Issues in Clinical Research
INTRODUCTION
PROTECTING INDIVIDUAL PARTICIPANT INTERESTS
Independent Review and Monitoring
Informed Consent, Surrogate Consent, Advance Directives, and Children's Assent
The Content of Informed Consent Processes
Who Can Provide Informed Consent—Adults
Who Can Provide Informed Consent—Children
SPECIAL PROTECTIONS FOR FETAL TISSUE, HUMAN EMBRYOS, AND HUMAN EMBRYONIC STEM CELLS
CONFLICT OF INTEREST AND FINANCIAL DISCLOSURE
PUBLIC TRANSPARENCY: REGISTRATION AND RESULTS REPORTING
RECORDKEEPING AND PRIVACY PROTECTION
Record Keeping Generally
Storing and Using Research Data—Health Insurance Portability and Accountability Act, the Privacy Act, and Certificates of C ...
DATA SHARING AND INDIVIDUAL CONSENT
CONCLUSION
SUMMARY/DISCUSSION QUESTIONS
References
13. National Institutes of Health Policy on the Inclusion of Women and Minorities as Subjects in Clinical Research
National Institutes of Health Policy
Scientific Considerations and Peer Review
Role of the Institutional Review Board
Challenges to Enrolling Volunteers
Women of Childbearing Potential, Pregnant Women, and Children
Demographic Trends in Clinical Trial Participation
What Have We Learned?
Conclusion
Summary Questions
References
Further Reading
14. Clinical Research: A Patient Perspective
The Patient–Scientist Partnership
A Good Start
Walking Away: Why Patients Refuse to Participate in Clinical Trials
Why African Americans Are Underrepresented in Clinical Trials
Why the Elderly Are Underrepresented in Clinical Trials
The Trial Begins: Understanding the Patient Experience
The Worst News
A New World
The Lay Expert
Understanding the Caregiver
The Role of the Patient Representative
The Role of Palliative Care
Managing Difficult News
Effective Patient Communications: Recommendations and Considerations
The Assertive Patient: Ally in Scientific Research
Conclusion
Further Reading
II STUDY DESIGN AND BIOSTATISTICS
15. Development and Conduct of Studies
Development and Conduct of Studies
How to Choose a Study Design
Development and Importance of a Study Protocol
Statement of Design
Study Sample
Inclusionary and Exclusionary Criteria
Identifying and Defining the Outcomes of Interest
Dosing/Intervention Intensity
Definition of Treatment/Intervention Development
Masking/Blinding
Data Collection
Recruitment and Retention
Data Analysis
Overall Analyses
Subgroup Analysis
Protocol Modifications
Authorship
Equipoise
Manual of Operating Procedures
Recruitment and Retention
Adherence
Masking
Dose Ranging
Laboratory Methods and Measurement Error
Treatment Fidelity
Reporting the Results
Conclusions
Summary Questions
Acknowledgments
Disclosures
References
16. Writing a Protocol
Introduction
Regulatory Oversight
Writing a Protocol
Clinical Trials
Elements of a Protocol
Key Protocol Components
Precis
Introduction or Background
Hypotheses and Objectives
Study Design and Methods
Recruitment
Screening
Procedures
Risks, Discomforts, and Inconveniences
Protocol Risk Category Determination
Protocol Benefit Category Determination
Overall Benefit-to-Risk Ratio Determination
Data and Safety Monitoring
Quality Assurance Monitoring
Unanticipated Problem, Adverse Event, and Deviation/Violation Reporting
Study Population: Eligibility Criteria
Vulnerable Populations
Alternatives to Participation
Privacy
Confidentiality
Statistical Analysis
Management of Data and Samples
Qualifications of Investigators
Legal Agreements
Conflict of Interest
Compensation
Consent Process and Documents
Persons Providing Consent
Individuals Obtaining Consent
Consent Process
Consent Form
References
Appendices
Summary
Acknowledgments
References
17. Design of Observational Studies
Introduction
Ecological (Correlational) Studies
Case Reports and Case Series
Objectives and Design
Observations and Analysis
Advantages and Disadvantages
Single Time Point Studies: Cross-sectional Studies, Prevalence Surveys, and Incidence Studies
Objectives and Design
Observations and Data Analysis
Advantages and Disadvantages
Case-Control Studies
Objectives and Design
Observations and Data Analysis
Advantages and Disadvantages
Cohort Studies: Retrospective, Prospective, and Studies Nested Within a Cohort
Objectives and Design
Nonconcurrent, Historical, or Retrospective Cohort Studies
Concurrent or Prospective Cohort Studies
Nested Case-Control Studies
Nested Case-Cohort Studies
Observations and Data Analysis
Advantages and Disadvantages
Odds Ratios, Risk Ratios, Relative Risks, and Attributable Risk
Mistakes, Misconceptions, and Misinterpretations
Always Trusting Bivariate Associations Based on Observational Study Data
Assuming Odds Ratios and Relative Risks Will Have a Similar Magnitude
Misinterpreting Relative Measures
Implying Causation (Even When We Do Not Mean to Do It)
Confusing Causation, Prediction, Association, and Confounding
Assuming Observational and Randomized Studies Never Agree
Trying to Design a Randomized Study When We Need an Observational Study
Assuming an Observational Study Is “Safe” and Does Not Need External Monitoring
Conclusions
Questions
Acknowledgments
Disclosures
References
18. Design of Clinical Trials and Studies
Design of Clinical Trials
The Purpose of Clinical Trials and Clinical Studies
Understanding the Spectrum of the Research Continuum
Phase I Studies
Phase II Studies
Phase III Studies
Phase IV Studies
Dissemination and Implementation Studies
Comparative Effectiveness Research
Explanatory Versus Pragmatic Trials
Quasiexperimental Studies
Clinical Trial Designs
Crossover Designs
Enriched Enrollment Designs
Factorial Designs
Parallel Groups Designs
Sequential Trial Designs and Interim Analyses
Group-Randomized Trial Designs
Adaptive Treatment Designs
Critical Issues in Clinical Study Design
Blinding or Masking
Intervention Development
Choosing the Comparison Group
Control Groups
Wait-List Control
Time and Attention Control
Placebo Control
Sham Control
Usual and Standard Care Controls
Multiple Control Groups
Placebo Responses
Background
Identifying Placebo Responders
Mistakes and Misconceptions
Not Looking at the CONSORT Statement Before, During, and After a Study
Waiting Until the Large Definitive Study to Worry About the Details
Failing to Increase the Treatment Effect
Failing to Decrease the Variance
Not Taking Care When Choosing a Control Group
Always Assuming Placebo Groups Are Unethical
Assuming Placebo Treatment Is (Im)Possible in Long-Term Studies
Confusing Placebo Response and Regression to the Mean
Using a Factorial or Partial Factorial Design Instead of a Parallel Group Design
Assuming Small, Open-Label, Nonrandomized, Uncontrolled Studies Offer No Evidence
Conclusions
Summary Questions
Acknowledgments
Disclosures
References
Further Reading
19. The Role of Comparative Effectiveness Research
Introduction
A History of Comparative Clinical Effectiveness Research
The Patient-Centered Outcomes Research Institute
The Role of Comparative Clinical Effectiveness Research in the Nation's Medical Research Enterprise
The Methods of Comparative Clinical Effectiveness Research
Getting the Research Question Right
Choosing the Study Population
Selecting Appropriate Interventions and Comparator(s)
Choosing Clinical Outcomes to Be Measured
The Role of Engagement in Specifying Research Questions
Study Designs for CER Studies
Experimental Study Designs for CER
Observational Study Designs for CER
Cohort Designs
Adjusting for and Avoiding Confounding in Observational CER Studies
Assessing Treatment Heterogeneity
Evidence Synthesis in CER
Building a National Infrastructure for the Conduct of Comparative Effectiveness Research
Conclusions
References
20. Using Large Data Sets for Population-Based Health Research
Introduction
What Are the Original Sources for These Data?
Uses of Secondary Data in Health Research
Monitoring Secular Trends
Health Disparities Research
Geographic Variation
Evaluating Specific Diseases and Treatments
Strengths
Limitations (and Solutions)
Data Quality
Missing Data
Lack of Clinical Detail
Data Mining and Statistical Significance
Generalizability and the Ecological Fallacy
Surveys
Linking Data Sets
Ethical Considerations
Future Directions and Conclusions
Summary Questions
References
21. Measures of Function and Health-Related Quality of Life
Introduction to Patient-Reported Outcomes, Measures of Function, and Health-Related Quality of Life
Systematic Reviews
Standard Systematic Reviews
Scoping Reviews
Alternative Reviews
Outcomes: Functional Measures and Patient-Reported Outcomes
Role of Patient-Reported Outcomes in Functional Outcome Measures
Measurement and Methodology
Psychometric Properties
Methodology in Measurement Development
Factor Analysis
Item Response Theory
Use of Patient-Reported Outcomes in Large Data Sets
National Health and Nutrition Examination Survey
Function
Function—Measurement and Use
Utility of Functional Measures
Features to Look for in a Functional Measure
Reliability, Validity, and “Value-Added” Features
Examples of Functional Measures
Selecting a Functional Measure
Selection Considerations: Diagnostic Criteria Versus Functional Measures
Example: Liver Disease Versus Symptoms Related to Function
Disease-Specific Measures
Examples of Disease-Specific Measures
Case Example: Chronic Liver Disease Questionnaire75
Summary Questions
References
Further Reading
22. Meta-analysis of Clinical Trials
Techniques of Meta-analysis
Formulating the Question
Defining Eligibility Criteria
Identifying Studies and Data Extraction
Statistical Analysis
Determining a Measure of Treatment Effect for Individual Studies
Combining Studies: Fixed Versus Random Effect
Heterogeneity
Publication Bias
Subgroup Analysis and Metaregression
Software
Reporting and Interpreting Results
Meta-analysis of Clinical Trials of Antiinflammatory Agents in Sepsis
Background: The Role of Inflammation in Mediating Sepsis
Formulating the Question
Defining Eligibility Criteria, Identifying Studies, and Data Extraction
Analyzing the Data
Conclusions
Summary Questions
References
23. Issues in Randomization
What Is Randomization?
Importance of Randomization
History of the Randomized Trial
Randomization Methods
Simple Randomization
Block Randomization
Stratified Randomization
Pseudorandomization Methods
Issues in Implementation
Sound Allocation
Mechanisms of Randomization
Monitoring
Special Considerations
Adaptive Randomization Methods
Documentation
Threats to the Integrity of Randomization
Conclusion
Summary Questions
Acknowledgment
Disclosures
References
24. Hypothesis Testing
Introduction
Three Motivating Examples
Statistical Inference
Basic Concepts in Hypothesis Testing
Formulation of Statistical Hypotheses in the Motivating Examples
Hypotheses for the Beta-Interferon/Magnetic Resonance Imaging Study
Hypotheses for the Felbamate Monotherapy Trial
Hypotheses for the ISIS-4 Trial: Comparing the Magnesium and No Magnesium Arms
One-Sample Hypothesis Tests With Applications to Clinical Research
Tests for Normal Continuous Data
Determining Statistical Significance
Critical Values
Confidence Intervals
z Tests or t Tests
Binary Data
Developing a Test
Exact Tests
Confidence Intervals
Example
Two-Sample Hypothesis Tests With Applications to Clinical Research
Tests for Comparing the Means of Two Normal Populations
Paired Data
Unpaired Data
Tests for Comparing Two Population Proportions
Hypothesis Tests for the Motivating Examples
Hypothesis Tests for the Beta-Interferon/Magnetic Resonance Imaging Study
Hypothesis Tests for the Felbamate Monotherapy Trial
Hypothesis Tests for the ISIS-4 Trial: Comparing the Magnesium and No Magnesium Arms
Common Mistakes in Hypothesis Testing
Misstatements and Misconceptions
Special Considerations
Comparing More Than Two Groups: One-Way Analysis of Variance
Simple and Multiple Linear Regression
Multiple Comparisons
Nonparametric Versus Parametric Tests
Conclusion
Summary Questions
Acknowledgments
Disclaimers
References
25. Power and Sample Size Calculations
Introduction
Basic Concepts
Notational Conventions
Review of the Normal and t-Distributions
Sample Size Calculations for Precision in Confidence Interval Construction
Confidence Intervals for Means of Continuous Data
Confidence Intervals for Binomial Proportions
Sample Size Calculations for Hypothesis Tests: One Sample of Data
Calculations for Continuous Data Regarding a Single Population Mean
Calculations for Binary Data Regarding a Single Population Proportion
Two-Stage Designs for a Single Population Proportion
Sample Size Calculations for Hypothesis Tests: Paired Data
Calculations for Paired Continuous Data
Calculations for Paired Binary Data
Sample Size Calculations for Hypothesis Tests: Two Independent Samples
Calculations for Continuous Data With Equal Variances and Equal Sample Sizes
Calculations for Continuous Data With Unequal Variances or Unequal Sample Sizes
Calculations for Two Independent Samples of Binary Data
Advanced Methods and Other Topics
Alternative Statistics and Sample Size Calculation Methods
Several Advanced Study Designs
Retention of Subjects
Statistical Computing
Conclusion
Exercises
Acknowledgments
Disclaimers
References
26. An Introduction to Survival Analysis
Introduction
Features of Survival Data
Survival Function
Kaplan–Meier and Product-Limit Estimators
Calculation and Formula for an Estimate
Calculation of Variance
Comparing Two Survival Functions
Comparing Two Survival Functions at a Given Time Point
Comparing Two Survival Functions Using the Whole Curve: Log-Rank Test
Example 1: Chronic Active Hepatitis Study
Stratified Log-Rank Test
Proportional Hazards Model
Calculation and Formulas
Common Mistakes
Conclusion
Questions
Acknowledgments
Disclaimer
References
27. Intermediate Topics in Biostatistics
Special Topics in Trial Design
Interim Monitoring and Alpha Spending
Introduction
Efficacy Boundaries
Futility
Summary
Adaptive Designs
Superiority, Noninferiority, and Equivalence
Special Considerations for Sample Size
Considerations for Early Phase Studies
Unequal Sample Sizes
Special Considerations in Data Analysis
A Trick for Confidence Interval Estimation When No Events Occur
Data Dependencies
Correlation
Relationships in Organization, Space, and Time
Essential Issues in Microarrays, Functional MRI, and Other Applications With Massive Data Sets
Regression to the Mean
Introduction
What Is Regression to the Mean?
Examples
Example 1 Change After Exceeding a Threshold
Example 2 Placebo Effect
Example 3 Screening Period Versus Trial Event Rates
Ways to Address Regression to the Mean
Summary
Diagnostic Testing
Measures of Accuracy
Considerations for Study Design
Common Mistakes and Biases
Summary
Special Considerations in Survival Analysis
Changes Over Time in Coefficients and Covariates
Time-Varying Coefficients or Time-Dependent Hazard Ratios
Time-Dependent Covariates
Dependent or Informative Censoring
Changes in Inclusion/Exclusion Criteria and Nonindependent Censoring
Competing Risks
Left and Interval Censoring
Recurrent Events Analysis
Sample Size
Missing Data
Introduction
Types of Missing Data
Methods for Handling Missing Data
Common Mistakes
Summary
Causal Inference in Observational Studies
Concluding Remarks
Summary Questions
Acknowledgments
Disclaimers
References
28. Large Clinical Trials and Registries—Clinical Research Institutes
Introduction
History
Phases of Evaluation of Therapies
Critical General Concepts
Validity
Generalizability
Expressing Clinical Trial Results
Concepts Underlying Trial Design
Treatment Effects Are Modest
Qualitative Interactions Are Uncommon
Quantitative Interactions Are Common
Unintended Biological Targets Are Common
Interactions Among Therapies Are Not Easily Predictable
Long-Term Effects May Be Unpredictable
General Design Considerations
Pragmatic Versus Explanatory
Entry Criteria
Data Collection Form
Ancillary Therapy
Multiple Randomization
Pick the Winner
Legal and Ethical Issues
Medical Justification
Groups of Patients Versus Individuals
Blinding
Endpoint Adjudication
Intensity of Intervention
Biomarkers and Surrogate Endpoints
Conflict of Interest
Special Issues With Device Trials
Hypothesis Formulation
Primary Hypothesis
Secondary and Tertiary Hypotheses
Intention to Treat
Publication Bias
Statistical Considerations
Type I Error and Multiple Comparisons
Type II Error and Sample Size
Noninferiority
Sample Size Calculations
Meta-analysis and Systematic Reviews
Understanding Covariates and Subgroups
Therapeutic Truisms
Operational Organization for Large-Scale Clinical Research
Executive Functions
The Steering Committee
The Data and Safety Monitoring Committee
The Institutional Review Board
Regulatory Authorities
Industry or Government Sponsors
Coordinating Functions
Intellectual Leadership
Data Coordinating Center
Site Management Organization
Supporting Functions
Information Technology
Finance
Human Resources
Contracts Management
Pharmacy and Supplies
Randomization Services
Project Management and Regulatory Affairs
Integration Into Practice
Controversies and Personal Perspective
Governmental Regulation Versus Professional Responsibility to Drive the Creation of Evidence
Composite and Surrogate Endpoints
Randomized Trials Versus Observational Studies
Sharing of Information
The Future
Summary Questions
References
III TECHNOLOGY TRANSFER, DATA MANAGEMENT, AND SOURCES OF FUNDING SUPPORT FOR RESEARCH
29. Intellectual Property and Technology Transfer
Introduction
Part One: Intellectual Property Generally
Background: Intellectual Property Defined
Patents—Historical Overview
First Steps: Before the American Revolution
United States Constitution
United States, 1789–1951: Systemic Adjustments
United States: The Modern Framework
The 1952 Patent Act
The “Federal Circuit”
US Patent Reform of 2011
Patent Treaties
Modern Philosophy of Patent Law
Fairness and the “Quid Pro Quo”
Incentives for Product Development
Economic Engine
Core Concepts of US Patent Law
What Is a Patent?
Patents Internationally
Utility, Plant, Design
Specific Rights Conveyed by Patents
Substantive Criteria for Patentability
Patentable Subject Matter
General Principles
“Mere Associations:” LabCorp v. Metabolite
Living Organisms and DNA: From Chakrabarty to Mayo to Myriad
Algorithms and Software: Benson-Flook-Diehr, State Street, and Bilski
“Utility” (“Industrial Applicability”)
“Novelty”
General Principles
Competing Claims of First-To-Invent: The “Interference”
“Nonobviousness”
General Principles
“Secondary Considerations”
“Obvious to Try”
Written Description, Enablement, and Best Mode
Written Description
Enablement
Best Mode
Other Key Terms Defined
“Prior Art”
“Conception” Versus “Reduction to Practice”
“Prophetic Conception” Versus “Simultaneous Conception and Reduction to Practice”
“Inventorship” and “Joint Inventorship”
Transfers of Ownership: “Assignment” Versus “License”
Patent Infringement (United States)
Civil Liability: In General
Civil Liability: Contributory and Induced Infringement
Major Defenses
Specific Exemptions and Immunities
Research-Use Exemption: Madey v. Duke University
Generic Drugs: The “Bolar Amendment” and Merck vs. Integra
The Medical Practitioner Exemption (“Frist-Ganske Amendment”)
US Government as Infringer
Remedies: Types and Measures
“Declaratory Judgment” Actions
Importation and the International Trade Commission
Practical Issues of Litigation
Basic Elements of the Patent Application Process
Content of a Patent Application
Specification
Claims
Technical Items
One Invention per Application (“Unity”)
The Duty of Disclosure and “Inequitable Conduct”
US Applications: Types and Filing Procedures
Basic Types of Applications
Timing Considerations
Export Control
Publication
Patent Life
Prosecution of a Patent Application
Options “After Issuance”
International Applications and Filing Procedures
Patent Cooperation Treaty Applications
Regional Patent Offices
Combining US and Patent Cooperation Treaty Filings
General Strategy Notes
Current Major Efforts to Alter US Patent Laws
International Harmonization
Patents on Genes and “Mere Associations”
Abusive Tactics: “Patent Trolls” and “Inequitable Conduct”
Compulsory Licensing and Breaking Patents
Copyrights, Trademarks, and Trade Secrets
Copyrights
Trademarks
Trade Secrets
General Principles
Key Statutes Relating to Trade Secrets and Federal Employees
Part Two: Patents and Technology Transfer
Critical Laws Concerning Patents and Federally Supported Research
Federal Funding of Private “Extramural” Research: The Bayh–Dole Act
History and Philosophy
Organization of Clauses
Key Concepts—§§ 200 and 201
Core Terms Required in Bayh–Dole Funding Agreements—§ 202
§ 202—Reporting Obligations (iEdison and RePORT)
§ 202—Determination of Exceptional Circumstances
“March-In”—§ 203
Case Study: CellPro
Case Study: Abbott and Pfizer
Case Study—Genzyme
Duty of US Manufacture—§ 204
Funding Agreements Outside the Bayh–Dole Act Involving Patent Rights
Federal “Intramural” Research: The Stevenson-Wydler Act and the Federal Technology Transfer Act
History and Philosophy of Stevenson-Wydler and Federal Technology Transfer Act
Key Concepts and Major Clauses
Subsequent Supporting Acts
Patenting and Licensing by Federal Agencies
Patenting and Licensing by Agency
Various Agency Missions
Scope of Licensing Authority
Exclusive and Coexclusive Licensing—Additional Considerations
Results
Inventions by the National Institutes of Health
Patent and Patent-Related Policies
General
Research Tools
Sharing of Data and Model Organisms
National Institutes of Health Portfolio Size and Scope
The National Institutes of Health Licensing Program
National Institutes of Health General Licensing Policies
Best Practices for Licensing Genomic Inventions
Scope of Licensing Authority
Types and Structure of National Institutes of Health Licenses
National Institutes of Health Licensing Process—Overview
After Signature—Royalty Management, Monitoring, and Enforcement
Success
Part Three: Technology Transfer Agreements
Background: Hypothetical Scenario
The First and Biggest Mistake: Signing the Agreements
Contract Execution in General
Scope of Actual Authority of Government Laboratories
Agreements to Protect Confidentiality
Background: Trade Secrets
Secrets and the Government
Anatomy of a Confidential Disclosure Agreement
Agreements to Transfer Materials
The Basic Material Transfer Agreement
Background
Anatomy of the Material Transfer Agreement
Parties
Materials
Uses
Confidentiality
Rights in the Materials
Termination
Warranties and Indemnification
Inventions: “Reach-Through” Rights
The Uniform Biological Material Transfer Agreement
The Clinical Trial Agreement
Other Key Specialized Material Transfer Agreements
Materials in Repositories
Software Transfer Agreements
Collaboration and Inventions: The Cooperative Research and Development Agreement
Background
Cooperative Research and Development Agreement Basics
Selecting the Collaborator
Negotiating the Agreement
Modifications to the Cooperative Research and Development Agreement Language
Appendix A: The Research Plan
Financial and Material Contributions
National Institutes of Health Review of the Agreement
Execution by the Parties and the Effective Date
Possibilities
Conclusion
Brief Glossary of Critical Terms in Patenting
Review Questions
References
30. Data Management in Clinical Trials
The Research Team
Principal Investigator and Subinvestigators
Research Director/Manager
Clinical Trials Nurse
Clinical Research Associate
Database Administrator
Statistician
Data Management
Data Elements
Case Report Forms
Choosing a Database System
Data Collection
Sources of Data
Quality Control of Data
Auditing
Unanticipated Problems and Adverse Event Monitoring and Reporting
Legal and Regulatory Issues Related to Data Reporting
Follow-Up and Analysis
Record Retention
Conclusion
Summary Questions
References
31. Clinical Research Data: Characteristics, Representation, Storage, and Retrieval
Introduction
Data as Surrogates
The Indirect Nature of Clinical Research Data
Objectivity and Subjectivity of Clinical Data
Transparency, Rigor, and Reproducibility
Metadata
Types of Data
Data Standards
Data Capture, Storage, and Retrieval
Clinical Trials Data Management Systems
Clinical Data Repositories
Responsible Stewardship of Data
Cooperative Sharing Efforts
Summary
Summary Questions
References
32. Management of Patient Samples
Introduction
Successful Research Rests on a Foundation of Careful Planning
The Role of Pre-analytic Variables in Research Using Patient Specimens
Training and Accreditation
The Importance of Good Record Keeping
Specimen Tracking
Specimen Collection
Specimen Handling
Specimen Transit
Specimen Storage
Access to Patient Samples
Specimen Culling, Transfer of Collections, and Repository Closings
Summary Questions
References
33. Evaluating a Protocol Budget
Overview
Institutional Review Board Fees
Overhead or Indirect Cost
Determining the Hourly Rate
The “Per Patient” Budget
Start-Up Cost and Invoiced Items
Submitting Your Budget to the Sponsor for Approval
Areas of Concern
Walking Away
Wrapping Up
34. Getting the Funding You Need to Support Your Research: Navigating the National Institutes of Health Peer Review Process
Overview of National Institutes of Health
Mission and Organization of National Institutes of Health
Responsibilities of National Institutes of Health Staff
National Institutes of Health Extramural Funding Mechanisms
National Institutes of Health Funding Announcements
Funding Opportunity Announcements
Requests for Applications and Program Announcements in the National Institutes of Health Guide
Electronic Submission of Applications Through Grants.gov
Multiple Principal Investigators
The National Institutes of Health Peer Review Process for Grants
The National Institutes of Health Dual-Review System
National Institutes of Health Review “Cycles”
Assignment of Applications to a Review Group and Funding Institute
How Are Reviewers Selected?
How Does the Review Proceed?
Review Criteria for Research Project Grant Applications
Core Review Criteria
Additional Review Criteria
Additional Review Considerations
Research Project Grant Applications From New/Early-Stage Investigators
Possible Scientific Review Group Actions
Overall Impact/Priority Score and Percentiles
The Summary Statement Tells You What the Reviewers Thought About Your Application
Review by National Advisory Councils and Boards
What Determines Which Applications Are Awarded?
Confidentiality and Conflict of Interest
Hints for Preparing Better Grant Applications
Planning Your Application
Allow Sufficient Time to Prepare the Application
Get Help
Follow the Instructions Closely—Submit a Complete and Carefully Prepared Application
Hints and Suggestions for Preparing Each Part of Your Application
SF424 (R&R) Project Summary/Abstract
PHS 398 Specific Research Plan Component
Specific Aims
Research Strategy
PHS 398 Specific Human Subjects Sections
Protection of Human Subjects
Data Safety Monitoring Plan
Inclusion of Women and Minorities
Inclusion of Children
Vertebrate Animals
Budget and Justification
Senior/Key Personnel Profiles Component and Biosketches
Facilities and Other Resources
Appendix
Recent Changes to Application Procedures for National Institutes of Health–Funded Clinical Trials—More to Come
Revising Unsuccessful Applications
How to Decide Whether to Revise Your Application
How to Revise and Resubmit Your Application
What if It Appears That the Study Section Was Inappropriate or Biased?
What if It Appears That There Was a Procedural Error During Peer Review?
National Institutes of Health Grant Programs for Clinical Researchers at Various Stages in Their Careers
Individual Career Development (“K”) Awards
Mentored Career Development Awards
Mentored Clinical Scientist Development Award (K08)
Mentored Patient-Oriented Research Career Development Award (K23)
Career Transition Awards
K99/R00 Pathway to Independence Award
K22 Career Transition Awards
Independent Scientist Awards
Midcareer Investigator Award in Patient-Oriented Research
Exploratory/Development Grant (R21) Applications
Small Research Grant (R03) Applications
Loan Repayment Program
How to Stay Informed About National Institutes of Health Peer Review
“About Grants” Page (https://grants.nih.gov/grants/about_grants.htm)
National Institutes of Health Institute/Center Home Pages
The Center for Scientific Review Home Page (www.csr.nih.gov)
35. Philanthropy's Role in Advancing Biomedical Research
Introduction
Organization of the Philanthropic Sector and Terminology
Foundations
Public Charities
Alliances and Umbrella Organizations Serving the Philanthropic Sector
History of the Philanthropic Sector
Private Foundations
Public Charities and Patient-Oriented Organizations
Areas of Contribution
Philanthropic Sector: Areas of Contribution
Developing Human Capital
Building Knowledge and Expanding Scientific Disciplines
Biomedical Imaging and Bioengineering
Neuroinflammation
Biomarkers
Stem Cell Research
Supporting Institutions
Stimulating Innovation
Translating Discoveries into Cures, Therapeutics, and Preventions of Disease
Establishing Product Development Partnerships
Fostering Dissemination of Information, Data Sharing, and Patient Engagement
Advocating for Resources and Policy Changes
Conclusions and Future Directions
Summary Questions
References
IV CLINICAL RESEARCH INFRASTRUCTURE
36. Identifying, Understanding, and Managing Patient Safety and Clinical Risks in the Clinical Research Environment
Identifying and Managing Clinical Risk in the Clinical Research Environment
Building a Road map to Safe and High-Quality Care and Research Support: Applying the Principles of High Reliability in the ...
Leveraging Patient Safety and Quality Improvement Techniques in the Conduct of Clinical Research
Proactively Assessing Clinical and Operational Risk
Continually Monitoring the Clinical Research Environment for Risk
Patient Safety and Clinical Event Reporting Systems
Electronic Surveillance for Errors and System Failures
Patient Safety and Clinical Quality Measures
Assessing Clinical Research Participants' Perceptions of the Clinical Research Experience
Conclusion
Summary Questions
References
37. Clinical Pharmacology and Its Role in Pharmaceutical Development
Clinical Pharmacology as a Translational Discipline
Definition and Scope
Overview of Drug Development
Current State of Affairs in Drug Development
Contribution of Clinical Pharmacology
First in Human Study
Starting Dose in First in Human Study
Dose Escalation in First in Human Study
Identification, Development, and Qualification of Biomarkers and Utilization of Functional Imaging Tools
Qualifying New Biomarkers
Safety Biomarkers
Efficacy Biomarkers and Surrogate End Points
Functional Imaging Tools Related to Phase 0
Personalized Medicine
Design and Conduct of Improved and Rigorous Phase I–II Studies With Adequate Exploration of the Exposure–Response Relationship
Modeling and Simulation and Model-Based Drug Development
Advent of Pharmacogenetics and Pharmacogenomics
The Role of the Regulatory Agency
FDA and Clinical Pharmacology
FDA and Drug Safety
FDA and the Special Populations
Summary Questions
References
38. Career Paths in Clinical Research
Background
Student and Resident Training in Clinical Research
Physician–Scientist Workforce
Clinical Research Curriculum and Training
NIH Clinical Center Core Curriculum
Additional Educational Approaches and Support for Training
Conclusions
Summary/Discussion Questions
References
39. Clinical Research Nursing: A New Domain of Practice
Introduction
Clinical Research Nursing: An Evolving Practice Specialty
Defining and Documenting the Specialty of Clinical Research Nursing
Conceptual Framework: The Domain of Practice
Practice Standards for Clinical Research Nursing
Standards of Care
Standards of Practice
Job Descriptions
Competency Assessment
Defining a Core Curriculum
What About Certification?
Legal Scope of Practice Issues
What Regulations Govern Practice and Liability in Clinical Research Settings?
Tools to Assist a Principal Investigator in Staffing a Study
Planning a Study in the Clinical Setting
Assessing the Need for Nursing Support
Creating the Staffing Plan
The Concept of “Research Intensity”
Future Considerations
Career Potential for Nurses in Clinical Research
Meeting the Need for Nurses to Fill Clinical Research Roles
Nursing Role in Community-Based Research
Supporting the Transition of Nurses Into Clinical Research From Clinical Practice
Summary/Discussion Questions
Acknowledgment
References
40. The Importance and Use of Electronic Health Records in Clinical Research
Electronic Medical Record
Electronic Health Record
Electronic Health Record Architecture
Example of an Electronic Health Record Architectural Diagram
Electronic Health Record System Connectivity at the National Institutes of Health Clinical Center
Clinical Research Information Systems
Using an Electronic Health Record in Clinical Research
Data Characteristics
Clinical Decision Support Within Electronic Health Record
Protocol Order Sets Within the Electronic Health Record
Sample Protocol Map/Research Grid
Secondary Use of the Electronic Health Record for Clinical Research
Legislation and the Electronic Health Record
Health Information Technology for Economic and Clinical Health Act
Medicare Access and Children's Health Insurance Program Reauthorization Act of 2015
U.S. Food and Drug Administration Guidance for Electronic Health Record in Clinical Research
Summary
Summary Questions
Terms
References
Further Reading
41. The Clinical Researcher and the Media
What Makes News in Science and Medicine?
Published Science—The Media's Bread and Butter
Novelty
The Unexpected
Celebrity
Controversy
Impact
Why Talk to Reporters?
Why Reporters Want to Talk to You
Why You Should Talk to Reporters
Social Media: What to Keep in Mind
Engaging the Media—The Process
A Word About Email, the Web, and Social Media
The Interview
What if You Are Misquoted?
What the Public Does Not Know About Science
Unexpected Questions
When the News Is Not Good
A Word About Investigative Reporters
The Freedom of Information Act
Embargoes
The Ingelfinger Rule
When to Contact Your Communications Office
Conclusion
Summary Questions
42. Information Resources for the Clinical Researcher
Introduction
Organization and Features of Information Resources
Origin
Content and Structure
Search Capabilities
Citation Searching
Access and Business Models
Familiarity and Currency
Biomedical Databases
Bioinformatics Resources
Major Bioinformatics Organizations
Bioinformatics Directories
Browsers
Commercial Software
Data Management
Data Integration and Precision Medicine
Bibliometrics
Bibliographic Managers
Resource Selection and Search Strategy
Educational Resources
Final Notes
Acknowledgments
References
1 Answer Key to Summary Questions
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 8
Chapter 9
Chapter 10
Chapter 12
Chapter 13
Chapter 15
Chapter 17
Chapter 18
Chapter 20
Chapter 21
Chapter 22
Chapter 23
Chapter 24
Chapter 26
Chapter 27
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Chapter 32
Chapter 35
Chapter 36
Chapter 37
Chapter 38
Chapter 39
Chapter 41
2 Acronyms
Part I—Ethical, Regulatory, and Legal Issues
Part II—Study Design and Biostatistics
Part III—Technology Transfer, Data Management, and Sources of Funding Support for Research
Part IV—Clinical Research Infrastructure
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Citation preview

PRINCIPLES AND PRACTICE OF CLINICAL RESEARCH FOURTH EDITION Edited by

JOHN I. GALLIN FREDERICK P. OGNIBENE LAURA LEE JOHNSON

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1800, San Diego, CA 92101-4495, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2018 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-849905-4 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Mica Haley Senior Content Strategist: Kristine Jones Editorial Project Manager: Fenton Coulthurst Production Project Manager: Kiruthika Govindaraju Designer: Matthew Limbert Typeset by TNQ Books and Journals

Contents List of Contributors Acknowledgments Preface

3. Integrity in Research: Principles for the Conduct of Research

xiii xv xvii

MELISSA C. COLBERT, ROBERT B. NUSSENBLATT, MICHAEL M. GOTTESMAN

1. A Historical Perspective on Clinical Research

Guidelines and Principles for the Conduct of Research Scientific Integrity and Research Misconduct Responsibilities of Research Supervisors and Trainees Data Management, Archiving, and Sharing Research Involving Human and Animal Subjects Collaborative and Team Science Conflict of Interest and Commitment Peer Review Publication Practices, Responsible Authorship, and Results Reproducibility Study Questions Acknowledgments References Further Reading

JOHN I. GALLIN

The Earliest Clinical Research Greek and Roman Influence Middle Ages and Renaissance Seventeenth Century Eighteenth Century Nineteenth Century Twentieth Century and Beyond Summary Questions References

1 2 2 3 4 7 11 14 14

I

36 36 38 39 40 41 42 45 45 45 46

4. Institutional Review Boards

ETHICAL, REGULATORY AND LEGAL ISSUES

JULIA SLUTSMAN, LYNNETTE NIEMAN

Historical, Ethical, and Regulatory Foundations of Current Requirements for Research Involving Human Subjects Institutional Review Boards Clinical Researchers and Institutional Review Boards Evaluation and Evolution of the Current System of Research Oversight and Institutional Review Boards Conclusion Summary Questions References

2. Ethical Principles in Clinical Research CHRISTINE GRADY

Distinguishing Clinical Research From Clinical Practice Ethics and Clinical Research History of Ethical Attention to Clinical Research Codes of Research Ethics and Regulations Research on Bioethical Questions Ethical Framework for Clinical Research Ethical Considerations in Randomized Controlled Trials Conclusion Summary Questions References

33 34

19 20 20 22 23

47 50 57 57 59 59 59

5. Accreditation of Human Research Protection Programs

23 27 29 29 30

ELYSE I. SUMMERS, MICHELLE FEIGE

A Brief History Principles of Accreditation

v

63 64

vi Human Research Protection Programs: The Shift to Shared Responsibility The Accreditation Standards Steps to Accreditation Value of Accreditation Summary Questions References

CONTENTS

65 66 70 70 72 72

6. The Regulation of Drugs and Biological Products by the Food and Drug Administration MOLLY M. FLANNERY, AMY E. McKEE, DIANE M. MALONEY, JONATHAN P. JAROW

Background Mission and Terminology Drug and Biological Product Life Cycle Compliance Summary Summary Questions

73 74 76 84 84 84

7. International Regulation of Drugs and Biological Products

93 95 98 98

CHRISTOPHER O. OLOPADE, MICHELLE TAGLE, OLUFUNMILAYO I. OLOPADE

99 100 103 106 106 107

9. The Role and Importance of Clinical Trial Registries and Results Databases

PAUL G. WAKIM, PAMELA A. SHAW

Why Monitor? Who Monitors? What to Monitor? When and How Often to Monitor? Special Topics Summary Summary Questions Acknowledgments References

127 128 130 136 137 138 139 139 139

The Reasons The Drug The Target The Trials Cassandra Revealed Extended Studies Fialuridine Toxicity Reassessing the Preclinical Studies Research Oversight The Investigations Begin Scientific Misconduct The Food and Drug Administration The National Institutes of Health The Institute of Medicine The Media The Congress The Law Epilogue Acknowledgments References Further Reading

143 144 145 145 147 147 147 149 149 150 150 151 152 152 153 154 154 155 157 157 158

12. Legal Issues in Clinical Research

161

VALERIE H. BONHAM

DEBORAH A. ZARIN, REBECCA J. WILLIAMS, TONY TSE, NICHOLAS C. IDE

Introduction Background Current Policies

10. Data and Safety Monitoring

STEPHEN E. STRAUS

87 88

8. Clinical Research in International Settings: Opportunities, Challenges, and Recommendations

Introduction Challenges Recommendations Conclusion Summary Questions References

116 118 120 121 123 123 123

11. Unanticipated Risk in Clinical Research

THERESA MULLIN

Introduction Background Overview of the International Council on Harmonisation Technical Harmonization Process International Council on Harmonisation Guidelines Most Relevant to Clinical Research Future Work in Regulatory Harmonization References

Registering Clinical Trials at ClinicalTrials.gov Reporting Results to ClinicalTrials.gov Using ClinicalTrials.gov Data Looking Forward Conclusion Summary/Discussion Questions References

111 112 115

Introduction Protecting Individual Participant Interests Special Protections for Fetal Tissue, Human Embryos, and Human Embryonic Stem Cells Conflict of Interest and Financial Disclosure

161 162 166 167

vii

CONTENTS

Public Transparency: Registration and Results Reporting Recordkeeping and Privacy Protection Data Sharing and Individual Consent Conclusion Summary/Discussion Questions References

168 168 171 173 173 173

13. National Institutes of Health Policy on the Inclusion of Women and Minorities as Subjects in Clinical Research JANINE CLAYTON, JULIANA BLOME

National Institutes of Health Policy Scientific Considerations and Peer Review Role of the Institutional Review Board Challenges to Enrolling Volunteers Women of Childbearing Potential, Pregnant Women, and Children Demographic Trends in Clinical Trial Participation What Have We Learned? Conclusion Summary Questions References Further Reading

177 179 180 181 182 183 184 186 186 186 187

14. Clinical Research: A Patient Perspective JERRY SACHS

The PatienteScientist Partnership Walking Away: Why Patients Refuse to Participate in Clinical Trials The Trial Begins: Understanding the Patient Experience Understanding the Caregiver The Role of the Patient Representative The Role of Palliative Care Managing Difficult News Effective Patient Communications: Recommendations and Considerations The Assertive Patient: Ally in Scientific Research Conclusion Further Reading

190 191 193 195 195 196 196 197 198 199 199

II

204 213 213 216 217 217 217 217 217

16. Writing a Protocol ELIZABETH A. BARTRUM, BARBARA I. KARP

Introduction Regulatory Oversight Writing a Protocol Elements of a Protocol Summary Acknowledgments References

219 219 220 220 229 229 229

17. Design of Observational Studies LAURA LEE JOHNSON

Introduction Ecological (Correlational) Studies Case Reports and Case Series Single Time Point Studies: Cross-sectional Studies, Prevalence Surveys, and Incidence Studies Case-Control Studies Cohort Studies: Retrospective, Prospective, and Studies Nested Within a Cohort Odds Ratios, Risk Ratios, Relative Risks, and Attributable Risk Mistakes, Misconceptions, and Misinterpretations Conclusions Questions Acknowledgments Disclosures References

231 233 233 234 236 239 242 243 247 247 247 247 247

18. Design of Clinical Trials and Studies CATHERINE M. STONEY, LAURA LEE JOHNSON

STUDY DESIGN AND BIOSTATISTICS 15. Development and Conduct of Studies CATHERINE M. STONEY, LAURA LEE JOHNSON

Development and Conduct of Studies How to Choose a Study Design

Development and Importance of a Study Protocol Equipoise Manual of Operating Procedures Reporting the Results Conclusions Summary Questions Acknowledgments Disclosures References

203 204

Design of Clinical Trials The Purpose of Clinical Trials and Clinical Studies Understanding the Spectrum of the Research Continuum Clinical Trial Designs Critical Issues in Clinical Study Design Control Groups

250 250 251 255 258 258

viii Placebo Responses Mistakes and Misconceptions Conclusions Summary Questions Acknowledgments Disclosures References Further Reading

CONTENTS

261 262 266 266 266 266 266 268

19. The Role of Comparative Effectiveness Research JOE V. SELBY, EVELYN P. WHITLOCK, KELLY S. SHERMAN, JEAN R. SLUTSKY

Introduction A History of Comparative Clinical Effectiveness Research The Patient-Centered Outcomes Research Institute The Role of Comparative Clinical Effectiveness Research in the Nation’s Medical Research Enterprise The Methods of Comparative Clinical Effectiveness Research Study Designs for CER Studies Evidence Synthesis in CER Building a National Infrastructure for the Conduct of Comparative Effectiveness Research Conclusions References

NAOMI L. GERBER, JILLIAN K. PRICE

Introduction to Patient-Reported Outcomes, Measures of Function, and Health-Related Quality of Life Systematic Reviews Outcomes: Functional Measures and PatientReported Outcomes Summary Questions References Further Reading

269

22. Meta-analysis of Clinical Trials

270

JUNFENG SUN, BRADLEY D. FREEMAN, CHARLES NATANSON

271 273 275 278 285 287 290 290

20. Using Large Data Sets for Population-Based Health Research LEIGHTON CHAN, PATRICK McGAREY, JOSEPH A. SCLAFANI

Introduction What Are the Original Sources for These Data? Uses of Secondary Data in Health Research Strengths Limitations (and Solutions) Surveys Linking Data Sets Ethical Considerations Future Directions and Conclusions Summary Questions References

21. Measures of Function and Health-Related Quality of Life

293 294 294 296 297 298 298 299 300 300 300

Techniques of Meta-analysis Meta-analysis of Clinical Trials of Antiinflammatory Agents in Sepsis Conclusions Summary Questions References

303 305 306 313 313 315

318 321 323 323 324

23. Issues in Randomization PAMELA A. SHAW, LAURA LEE JOHNSON, CRAIG B. BORKOWF

What Is Randomization? Importance of Randomization History of the Randomized Trial Randomization Methods Issues in Implementation Special Considerations Conclusion Summary Questions Acknowledgments Disclosures References

329 330 330 331 333 335 337 338 338 338 338

24. Hypothesis Testing LAURA LEE JOHNSON, CRAIG B. BORKOWF, PAMELA A. SHAW

Introduction Basic Concepts in Hypothesis Testing Formulation of Statistical Hypotheses in the Motivating Examples One-Sample Hypothesis Tests With Applications to Clinical Research

342 343 345 346

ix

CONTENTS

Two-Sample Hypothesis Tests With Applications to Clinical Research Hypothesis Tests for the Motivating Examples Common Mistakes in Hypothesis Testing Misstatements and Misconceptions Special Considerations Conclusion Summary Questions Acknowledgments Disclaimers References

349 351 353 353 354 356 356 357 357 357

25. Power and Sample Size Calculations CRAIG B. BORKOWF, LAURA LEE JOHNSON, PAUL S. ALBERT

Introduction Sample Size Calculations for Precision in Confidence Interval Construction Sample Size Calculations for Hypothesis Tests: One Sample of Data Sample Size Calculations for Hypothesis Tests: Paired Data Sample Size Calculations for Hypothesis Tests: Two Independent Samples Advanced Methods and Other Topics Conclusion Exercises Acknowledgments Disclaimers References

359 361 362 364 366 368 369 370 371 371 371

26. An Introduction to Survival Analysis LAURA LEE JOHNSON

Introduction Features of Survival Data Survival Function Common Mistakes Conclusion Questions Acknowledgments Disclaimers References

373 374 375 380 380 381 381 381 381

403 405 406 406 407 407 407

28. Large Clinical Trials and RegistriesdClinical Research Institutes ROBERT M. CALIFF

Introduction History Phases of Evaluation of Therapies Critical General Concepts Expressing Clinical Trial Results Concepts Underlying Trial Design General Design Considerations Legal and Ethical Issues Hypothesis Formulation Publication Bias Statistical Considerations Meta-analysis and Systematic Reviews Understanding Covariates and Subgroups Therapeutic Truisms Operational Organization for Large-Scale Clinical Research Integration Into Practice Controversies and Personal Perspective The Future Summary Questions References

412 412 413 414 415 417 420 422 427 427 428 430 431 432 433 437 437 439 440 440

III TECHNOLOGY TRANSFER, DATA MANAGEMENT, AND SOURCES OF FUNDING SUPPORT FOR RESEARCH 29. Intellectual Property and Technology Transfer BRUCE GOLDSTEIN

27. Intermediate Topics in Biostatistics PAMELA A. SHAW, LAURA LEE JOHNSON, MICHAEL A. PROSCHAN

Special Topics in Trial Design Special Considerations in Data Analysis Regression to the Mean Diagnostic Testing Special Considerations in Survival Analysis

Missing Data Causal Inference in Observational Studies Concluding Remarks Summary Questions Acknowledgments Disclaimers References

384 392 394 396 400

Introduction Part One: Intellectual Property Generally Part Two: Patents and Technology Transfer Part Three: Technology Transfer Agreements Conclusion Brief Glossary of Critical Terms in Patenting Review Questions References

448 448 487 503 518 519 519 520

x

CONTENTS

30. Data Management in Clinical Trials

33. Evaluating a Protocol Budget

DIANE C. ST GERMAIN, MARJORIE J. GOOD

PHYLLIS KLEIN

The Research Team Data Management Auditing Unanticipated Problems and Adverse Event Monitoring and Reporting Legal and Regulatory Issues Related to Data Reporting Follow-Up and Analysis Record Retention Conclusion Summary Questions References

531 533 538 540 542 543 543 544 544 544

31. Clinical Research Data: Characteristics, Representation, Storage, and Retrieval JAMES J. CIMINO

Introduction Data as Surrogates Types of Data Data Standards Data Capture, Storage, and Retrieval Responsible Stewardship of Data Cooperative Sharing Efforts Summary Summary Questions References

547 547 550 550 551 553 555 556 556 557

32. Management of Patient Samples KAREN E. BERLINER, AMY P.N. SKUBITZ

Introduction Successful Research Rests on a Foundation of Careful Planning The Role of Pre-analytic Variables in Research Using Patient Specimens Training and Accreditation The Importance of Good Record Keeping Specimen Tracking Specimen Collection Specimen Handling Specimen Transit Specimen Storage Access to Patient Samples Specimen Culling, Transfer of Collections, and Repository Closings Summary Questions References

559 560 560 561 562 562 563 565 565 566 567 567 567 568

Overview Institutional Review Board Fees Overhead or Indirect Cost Determining the Hourly Rate The “Per Patient” Budget Start-Up Cost and Invoiced Items Submitting Your Budget to the Sponsor for Approval Areas of Concern Walking Away Wrapping Up

571 572 572 572 573 577 582 585 586 586

34. Getting the Funding You Need to Support Your Research: Navigating the National Institutes of Health Peer Review Process VALERIE L. PRENGER

Overview of National Institutes of Health The National Institutes of Health Peer Review Process for Grants Hints for Preparing Better Grant Applications Recent Changes to Application Procedures for National Institutes of HealtheFunded Clinical Trialsd More to Come Revising Unsuccessful Applications National Institutes of Health Grant Programs for Clinical Researchers at Various Stages in Their Careers How to Stay Informed About National Institutes of Health Peer Review

590 594 600

605 605

607 609

35. Philanthropy’s Role in Advancing Biomedical Research ELAINE K. GALLIN, MARYROSE FRANKO, ENRIQUETA BOND

Introduction Organization of the Philanthropic Sector and Terminology History of the Philanthropic Sector Areas of Contribution Conclusions and Future Directions Summary Questions References

611 613 615 617 628 629 629

xi

CONTENTS

Additional Educational Approaches and Support for Training Conclusions Summary/Discussion Questions References

IV CLINICAL RESEARCH INFRASTRUCTURE 36. Identifying, Understanding, and Managing Patient Safety and Clinical Risks in the Clinical Research Environment LAURA M. LEE, DAVID K. HENDERSON

Identifying and Managing Clinical Risk in the Clinical Research Environment Building a Road map to Safe and High-Quality Care and Research Support: Applying the Principles of High Reliability in the Clinical Research Environment Leveraging Patient Safety and Quality Improvement Techniques in the Conduct of Clinical Research Proactively Assessing Clinical and Operational Risk Electronic Surveillance for Errors and System Failures Patient Safety and Clinical Quality Measures Assessing Clinical Research Participants’ Perceptions of the Clinical Research Experience Conclusion Summary Questions References

633

635 635 638 641 641 642 642 643 643

37. Clinical Pharmacology and Its Role in Pharmaceutical Development SUE CHENG, KONSTANTINA M. VANEVSKI, JUAN J.L. LERTORA

Clinical Pharmacology as a Translational Discipline Overview of Drug Development Current State of Affairs in Drug Development Contribution of Clinical Pharmacology The Role of the Regulatory Agency Summary Questions References

645 646 647 649 654 656 656

39. Clinical Research Nursing: A New Domain of Practice GWENYTH R. WALLEN, CHERYL A. FISHER

Introduction Clinical Research Nursing: An Evolving Practice Specialty Defining and Documenting the Specialty of Clinical Research Nursing Legal Scope of Practice Issues Tools to Assist a Principal Investigator in Staffing a Study Future Considerations Summary/Discussion Questions Acknowledgments References

671 672 674 679 680 682 684 684 684

40. The Importance and Use of Electronic Health Records in Clinical Research JON W. McKEEBY, PATRICIA S. COFFEY

Electronic Medical Record Electronic Health Record Electronic Health Record Architecture Clinical Research Information Systems Using an Electronic Health Record in Clinical Research Secondary Use of the Electronic Health Record for Clinical Research Legislation and the Electronic Health Record Summary Summary Questions Terms References Further Reading

687 688 688 688 692 698 698 699 699 701 701 702

41. The Clinical Researcher and the Media JOHN T. BURKLOW

38. Career Paths in Clinical Research FREDERICK P. OGNIBENE

Background Student and Resident Training in Clinical Research PhysicianeScientist Workforce Clinical Research Curriculum and Training NIH Clinical Center Core Curriculum

668 669 669 669

661 662 664 665 666

What Makes News in Science and Medicine? Published SciencedThe Media’s Bread and Butter Novelty The Unexpected Celebrity Controversy Impact

704 704 704 705 705 705 706

xii Why Talk to Reporters? Why Reporters Want to Talk to You Why You Should Talk to Reporters Social Media: What to Keep in Mind Engaging the MediadThe Process A Word About Email, the Web, and Social Media The Interview What if You Are Misquoted? What the Public Does Not Know About Science Unexpected Questions When the News Is Not Good A Word About Investigative Reporters The Freedom of Information Act Embargoes When to Contact Your Communications Office Conclusion Summary Questions

CONTENTS

706 706 706 707 707 707 708 710 710 710 710 710 711 711 712 712 712

42. Information Resources for the Clinical Researcher JOSH A. DUBERMAN, PAMELA C. SIEVING

Introduction Organization and Features of Information Resources Origin

714 714 715

Content and Structure Search Capabilities Citation Searching Access and Business Models Familiarity and Currency Biomedical Databases Bioinformatics Resources Data Management Data Integration and Precision Medicine Bibliometrics Bibliographic Managers Resource Selection and Search Strategy Educational Resources Final Notes Acknowledgments Disclosure References

Appendix 1: Answer Key to Summary Questions Appendix 2: Acronyms Index

715 717 721 721 723 724 744 746 746 747 748 748 749 749 750 750 750

753 761 775

List of Contributors Paul S. Albert National Institutes of Health, Rockville, MD, United States

Maryrose Franko Health Research Alliance, Research Triangle Park, NC, United States

Elizabeth A. Bartrum National Institutes of Health, Bethesda, MD, United States

Bradley D. Freeman Washington University School of Medicine, St. Louis, MO, United States

Karen E. Berliner National Institutes of Health, Bethesda, MD, United States

Elaine K. Gallin QE Philanthropic Advisors, Potomac, MD, United States

Juliana Blome National Institutes of Health, Bethesda, MD, United States

John I. Gallin National Institutes of Health, Bethesda, MD, United States

Enriqueta Bond United States

Naomi L. Gerber George Mason University, Fairfax, VA, United States; Inova Health System, Falls Church, VA, United States

QE Philanthropic Advisors, Warrenton, VA,

Valerie H. Bonham National Institutes of Health, Bethesda, MD, United States

Bruce Goldstein a National Institutes of Health, Rockville, MD, United States

Craig B. Borkowf Centers for Disease Control and Prevention, Atlanta, GA, United States

Marjorie J. Good National Cancer Institute, National Institutes of Health, Rockville, MD, United States

John T. Burklow National Institutes of Health, Bethesda, MD, United States Robert M. Califf Duke University School of Medicine, Durham, NC, United States; Verily Life Sciences (Alphabet), South San Francisco, CA, United States; Stanford University Department of Medicine, Stanford, CA, United States Leighton Chan National Institutes of Health, Bethesda, MD, United States Sue Cheng Bayer HealthCare Pharmaceuticals, Inc., Whippany, NJ, United States James J. Cimino University of Alabama School of Medicine, Birmingham, AL, United States Janine Clayton National Institutes of Health, Bethesda, MD, United States Patricia S. Coffey National Institutes of Health, Bethesda, MD, United States

David K. Henderson National Institutes of Health, Bethesda, MD, United States Nicholas C. Ide National Institutes of Health, Bethesda, MD, United States Jonathan P. Jarow U.S. Food and Drug Administration, Silver Spring, MD, United States Laura Lee Johnson U.S. Food and Drug Administration, Silver Spring, MD, United States Barbara I. Karp National Institutes of Health, Bethesda, MD, United States

Laura M. Lee National Institutes of Health, Bethesda, MD, United States

Josh A. Duberman National Institutes of Health, Bethesda, MD, United States Michelle Feige Association for the Accreditation of Human Research Protection Programs, Inc., Washington, DC, United States

Molly M. Flannery U.S. Food and Drug Administration, Silver Spring, MD, United States

Christine Grady National Institutes of Health, Bethesda, MD, United States

Phyllis Klein Washington University, St. Louis, MO, United States

Melissa C. Colbert National Institutes of Health, Bethesda, MD, United States

Cheryl A. Fisher National Institutes of Health, Bethesda, MD, United States

Michael M. Gottesman National Institutes of Health, Bethesda, MD, United States

Juan J.L. Lertora Duke University School of Medicine, Durham, NC, United States Diane M. Maloney U.S. Food and Drug Administration, Silver Spring, MD, United States Patrick McGarey National Institutes of Health, Bethesda, MD, United States Amy E. McKee U.S. Food and Drug Administration, Silver Spring, MD, United States

a

Mr. Goldstein is a patent attorney serving as the Assistant Director for Monitoring & Enforcement Unit in the NIH Office of Technology Transfer. This chapter reflects the personal views of Mr. Goldstein, not of his employer. No official support or endorsement by the National Institutes of Health or the United States Government is intended or should be inferred.

xiii

xiv

LIST OF CONTRIBUTORS

Jon W. McKeeby National Institutes of Health, Bethesda, MD, United States

Amy P.N. Skubitz University of Minnesota, Minneapolis, MN, United States

Theresa Mullin U.S. Food and Drug Administration, Silver Spring, MD, United States

Jean R. Slutsky Patient-Centered Outcomes Research Institute (PCORI), Washington, DC, United States

Charles Natanson National Institutes of Health, Bethesda, MD, United States

Julia Slutsman National Institutes of Health, Washington, DC, United States

Lynnette Nieman National Institutes of Health, Bethesda, MD, United States

Diane C. St Germain National Cancer Institute, National Institutes of Health, Rockville, MD, United States

Robert B. Nussenblatt y National Institutes of Health, Bethesda, MD, United States

Catherine M. Stoney National Institutes of Health, Bethesda, MD, United States

Frederick P. Ognibene National Institutes of Health, Bethesda, MD, United States

Stephen E. Straus y National Institutes of Health, Bethesda, MD, United States

Christopher O. Olopade The University of Chicago, Chicago, IL, United States

Elyse I. Summers Association for the Accreditation of Human Research Protection Programs, Inc., Washington, DC, United States

Olufunmilayo I. Olopade The University of Chicago, Chicago, IL, United States Valerie L. Prenger National Institutes of Health, Bethesda, MD, United States Jillian K. Price Inova Health System, Falls Church, VA, United States Michael A. Proschan National Institutes of Health, Bethesda, MD, United States Jerry Sachs National Institutes of Health, Bethesda, MD, United States Joseph A. Sclafani National Institutes of Health, Bethesda, MD, United States; Medstar Georgetown University/ National Rehabilitation Network, Washington, DC, United States Joe V. Selby Patient-Centered Outcomes Research Institute (PCORI), Washington, DC, United States Pamela A. Shaw University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States Kelly S. Sherman Patient-Centered Outcomes Research Institute (PCORI), Washington, DC, United States Pamela C. Sieving Sieving Information Solutions, Bethesda, MD, United States

y

Deceased.

Junfeng Sun National Institutes of Health, Bethesda, MD, United States Michelle Tagle The University of Chicago, Chicago, IL, United States Tony Tse National Institutes of Health, Bethesda, MD, United States Konstantina M. Vanevski Bayer HealthCare Pharmaceuticals, Inc., Basel, Switzerland Paul G. Wakim National Institutes of Health, Bethesda, MD, United States Gwenyth R. Wallen National Institutes of Health, Bethesda, MD, United States Evelyn P. Whitlock Patient-Centered Outcomes Research Institute (PCORI), Washington, DC, United States Rebecca J. Williams National Institutes of Health, Bethesda, MD, United States Deborah A. Zarin National Institutes of Health, Bethesda, MD, United States

Acknowledgments

The editors extend special thanks to Ms. Jennifer Simmons for her energetic administrative support in coordinating the many activities associated with the development of the fourth edition of this textbook, Ms. Rona Buchbinder for her dedicated and excellent editorial assistance, and Ms. Kristine Jones, Ms. Molly McLaughlin, and Mr. Fenton Coulthurst at Elsevier for their patience and perseverance in bringing this huge undertaking to fruition. Very special thanks to all of the authors who contributed outstanding, up-to-date chapters to this fourth edition and the numerous patients, study participants, and course participants over the years who inspired them.

xv

Preface

The positive reactions and feedback to the first three editions of Principles and Practice of Clinical Research have been appreciated and reinforced the importance of this textbook to the discipline of clinical research. In each edition the content of the textbook has been updated and new information added. The textbook nearly doubled in size from the second to the third editions as an expanded and comprehensive section on biostatistics was included. The critical importance of study design and biostatistics, coupled with enhanced research regulatory requirements prompted the addition of a new editor, Laura Lee Johnson, PhD. Dr. Johnson has been a colleague for years, having been a faculty member and currently codirector of the National Institutes of Health (NIH) Clinical Center’s “Introduction to the Principles and Practice of Clinical Research” (IPPCR) course. After many years at the NIH, she is now Acting Director of the Division of Biometrics III in the Center for Drug Evaluation and Research at the U.S. Food and Drug Administration (FDA). She is an extremely welcome addition as the third editor of this textbook. IPPCR started at the NIH Clinical Center in 1995 and was the impetus for the first edition of this textbook. Currently IPPCR is a web-based course using recorded lectures by many of the textbook’s authors, online bulletin boards for each lecture, and local study groups hosted by volunteer institutions around the world. In the 2016e17 academic year we had over 8,800 registrants at 270 sites around the world. Since its inception the course has had nearly 38,000 participants formally enroll and an even wider audience informally taking the course or watching lectures via YouTube. In addition, the textbook has been translated into Chinese, Japanese, and Russian and has been used for live intensive IPPCR courses taught in China, Nigeria, Russia, India, Brazil, and South Africa. Based on broad international needs and interest in enhancing clinical research infrastructure around the world, this fourth edition includes an expanded chapter on clinical research in international settings as well as a new chapter focusing on international regulation of drugs and biologics. It also includes updated content on large clinical trials and registries as well as a new chapter focusing on the emergence of the important role of comparative effectiveness research. Since clinical research has become more complex and thus, potentially, more risky, there is a new chapter devoted to identifying clinical risks and managing patient safety in a clinical research setting. There also is new content on the use of electronic health records in clinical research and a very detailed presentation of the broad utility and application of informational resources in clinical research. With the growth of the clinical research enterprise and the need to ensure that the highest standards are maintained, chapters about accreditation of human research protection programs, regulatory sciences, and research integrity have been enhanced. We hope that this book provides its audience with a deeper understanding of the broadening scope of the global clinical research enterprise. The textbook provides not only details about clinical research mechanics and practical information but also introduces the reader to the complexities and intricacies of ensuring safe, ethically sound, and scientifically rigorous clinical research. All clinical investigators must consider the safety of research subjects enrolled in their investigational protocols while navigating the research pathways from the bedside to the bench and back. We are proud of this book on so many levels and hope that the passion, expertise, and dynamic quality of our contributors and their content are appreciated by you, the readers. John I. Gallin, MD Frederick P. Ognibene, MD Laura Lee Johnson, PhD

xvii

C H A P T E R

1 A Historical Perspective on Clinical Research John I. Gallin National Institutes of Health, Bethesda, MD, United States

O U T L I N E The Earliest Clinical Research

1

Greek and Roman Influence

2

Middle Ages and Renaissance

2

Seventeenth Century

3

Eighteenth Century

4

Nineteenth Century Twentieth Century and Beyond

11

Summary Questions

14

References

14

health services research; epidemiology; and community-based and managed care-based research.

If I have seen a little further it is by standing on the shoulders of giants. Sir Isaac Newton (1676).

The successful translation of a basic or clinical observation into a new treatment of disease is rare in an investigator’s professional life, but when it occurs, the personal thrill is exhilarating, and the impact on society may be substantial. The following historical highlights provide a perspective of the continuum of the clinical research endeavor. These events also emphasize the contributions that clinical research has made to advances in medicine and public health. In this chapter, and throughout the book, a broad definition of clinical research from the Association of AmericanMedicalCollegesTaskForceonClinicalResearch is used.1 This task force defined clinical research as

THE EARLIEST CLINICAL RESEARCH Medical practice and clinical research are grounded in the beginnings of civilization. Egyptian medicine was dominant from approximately 2850 BC to 525 BC. The Egyptian Imhotep, whose name means “he who gives contentment,” lived slightly after 3000 BC and was the first physician figure to rise out of antiquity.2 Imhotep was a known scribe, priest, architect, astronomer, and magician (medicine and magic were used together); he performed surgery, practiced some dentistry, extracted medicine from plants, and knew the position and function of the vital organs. Imhotep likely provided the first description of cancer in one of his 48 clinical case reports. In case 45, he reported, “If you examine (a case) having bulging masses on (the breast) and you find that they have spread over his breast; if you place your hand upon (the) breast (and) find them to be cool, there being no fever at all therein when your hand feels him; they have no granulations,

a component of medical and health research intended to produce knowledge essential for understanding human disease, preventing and treating illness, and promoting health. Clinical research embraces a continuum of studies involving interaction with patients, diagnostic clinical materials or data, or populations, in any of these categories: disease mechanisms; translational research; clinical knowledge; detection; diagnosis and natural history of disease; therapeutic interventions including clinical trials; prevention and health promotion; behavioral research;

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00001-0

7

1

Copyright © 2018. Published by Elsevier Inc.

2

1. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

contain no fluid, give rise to no liquid discharge, yet they feel protuberant to your touch you say concerning him: ‘This is a case of bulging masses I have to contend with.’ Bulging tumors of the breast mean the existence of swellings on the breast, large, spreading, and hard; touching them is like touching a ball of wrappings, or they may be compared with unripe hemat fruit, which is hard and cool to the touch.”3 Evidence also shows that ancient Chinese medicine included clinical studies. For example, in 2737 BC, Shen Nung, the putative father of Chinese medicine, experimented with poisons and classified medical plants,4 and I. Yin (1176e1123 BC), a famous prime minister of the Shang dynasty, described the extraction of medicines from boiling plants.5 Documents from early Judeo-Christian and Eastern civilizations provide examples of a scientific approach to medicine and the origin of clinical research. In the Old Testament, written from the 15th century BC to approximately the 4th century BC,6 a passage in the first chapter of the Book of Daniel describes a comparative “protocol” of diet and health. In the setting of Babylon where Israelites defiled the sin of eating rich food, Daniel described the preferred diet of legumes and water made for healthier youths compared with the king’s rich food and wine: Then Daniel said to the steward. “Test your servants for ten days; let us be given vegetables to eat and water to drink. Then let your appearance and the appearance of the youths who eat the king’s rich food be observed by you, and according to what you see deal with your servants. So he harkened to them in this matter; and tested them for ten days. At the end of ten days it was seen that they were better in appearance and fatter in flesh than all the youths who ate the king’s rich food. So the steward took away their rich food and the wine they were to drink, and gave them vegetables.” Daniel 1:11e16

The ancient Hindus excelled in early medicine, especially in surgery. Sushruta, the father of Indian surgery, resided in the court of the Gupta kings in about 600 BC and wrote medical texts about surgery, the most famous being Sushruta Samhita, an encyclopedia of medical learning. In addition, there is evidence of Indian hospitals in Ceylon in 437 BC and 137 BC.7

GREEK AND ROMAN INFLUENCE Although early examples of clinical research predate the Greeks, Hippocrates (460e370 BC) is considered the father of modern medicine, and he exhibited the strict discipline required of a clinical investigator. His emphasis on the art of clinical inspection, observation, and documentation established the science of

medicine. In addition, as graduating physicians are reminded when they take the Hippocratic oath, he provided physicians with high moral standards. Hippocrates’ meticulous clinical records were maintained in 42 case records.8 These case studies describe, among other maladies, malarial fevers, diarrhea, dysentery, melancholia, mania, and pulmonary edema with remarkable clinical acumen. On pulmonary edema, he wrote the following: Water accumulates; the patient has fever and cough; the respiration is fast; the feet become edematous; the nails appear curved and the patient suffers as if he has pus inside, only less severe and more protracted. One can recognize that it is not pus but water.if you put your ear against the chest you can hear it seethe inside like sour wine.9

Hippocrates also described the importance of cleanliness in the management of wounds. He wrote, “If water was used for irrigation, it had to be very pure or boiled, and the hands and nails of the operator were to be cleansed.”10 Hippocrates used the Greek word for “crab,” karkinos, to describe cancer. The tumor, with its clutch of swollen blood vessels around it, reminded Hippocrates of a crab dug in the sand with its legs spread in a circle.11 Hippocrates’ teachings remained dominant and unchallenged until the time of Claudius Galen of Pergamum (c.130e200 AD), the physician to the Roman Emperor Marcus Aurelius.12 Galen was one of the first individuals to utilize animal studies to understand human disease. By experimenting on animals, he was able to describe the effects of transection of the spinal cord at different levels. According to Galen, health and disease reflected the balance of four humors (blood, phlegm, black bile, and yellow bile), and veins contained blood and the humors, together with some spirit.12 Inflammation, described by Galen as a red, hot, and painful distention, was attributed to excessive blood. Tubercles, pustules, catarrh, and nodules of lymph, all cool, boggy and white, were attributed to excesses of lymph. Jaundice was an overflow of yellow bile. Cancer was attributed to black bile as was melancholia, the medieval term for depression. Thus cancer and depression were closely intertwined.13

MIDDLE AGES AND RENAISSANCE In the Middle Ages, improvements in medicine became evident, and the infrastructure for clinical research began to develop. Hospitals and nursing, with origins in the teachings of Christ,14 became defined institutions (although the beginnings of hospitals can be traced to the ancient Babylonian custom of bringing the sick into the marketplace for consultation, and the Greeks and Romans had military hospitals).

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

3

SEVENTEENTH CENTURY

The Persian al-Razi (865e925) discovered the use of alcohol as an antiseptic and wrote the first treatise on pediatrics, as well as more than 180 books and articles.15 Persian scientists emphasized the importance of methodology, and Ibn al-Haytham (Alhazen) wrote his Book of Optics, for which he is regarded as the father of optics.16 The surgical needle was invented and described by Abu al-Qasim al-Zahrawi in his Al-Tasrif in the year 1000.17 The Iraqi surgeon Ammar ibn Ali al-Mawsili invented the first injection syringe in the 9th century using a hollow glass tube and suction to extract and remove cataracts from patients’ eyes.18 By the 1100s and 1200s, hospitals were being built in England, Scotland, France, and Germany. Early progress in pharmacology can be linked to the Crusades and the development of commerce. Drug trade became enormously profitable during the Middle Ages. Drugs were recognized as the lightest, most compact, and most lucrative of all cargoes. Records of the customhouse at the port of Acre (1191e1291) show a lively traffic in aloes, benzoin, camphor, nutmegs, and opium.19 Influences of Arabic pharmacy and contact of the Crusaders with their Muslim foes spread the knowledge of Arabic pharmaceuticals and greatly enhanced the value of drugs from the Far East. The Persian Ibn Sina-Avicenna (980e1037), a leader in pharmacy, philosophy, medicine, and pharmacology, wrote The Canon of Medicine, which describes seven conditions for “the recognition of the strengths of the characteristics of medicines through experimentation”: ensuring the use of pure drugs, testing the drug for only one disease, use of control groups, use of dose escalation, requirement of long-term observation, requirement of reproducible results, and requirement of human over animal testing.20 Documentation through case records is an essential feature of clinical research. Pre-Renaissance medicine of the 14th and 15th centuries saw the birth of “Consilia” or medical case books, consisting of clinical records from the practice of well-known physicians.21 Hippocrates’ approach of case studies developed 1700 years earlier was reborn, particularly in the Bolognese and Paduan regions of Italy. Universities became important places of medicine in Paris, Bologna, and Padua. Clinical research remained mostly descriptive, resembling today’s natural history and disease pathogenesis protocols. In 1348, Gentile da Foligno, a Paduan professor, described gallstones.21 Bartolomeo Montagna (1470), an anatomist, described strangulated hernia, operated on lachrymal fistula, and extracted decayed teeth.21 The Renaissance (1453e1600) represented the revival of learning and the transition from medieval to modern conditions; many great clinicians and scientists prospered. At this time, many of the ancient Greek dictums of medicine, such as Galen’s four humors, were discarded. Perhaps the most important anatomist of

FIGURE 1.1 Leonardo da Vinci self-portrait (red chalk); Turin, Royal Library. From Da Vinci L. Copyright in Italy by the Institute Geografic DeAgostini S.p.A. e Novara. New York: Reymal & Company; 1956, Fig. 1 [Wikipedia].

this period was Leonardo da Vinci (1453e1519) (Fig. 1.1).22 Da Vinci created more than 750 detailed anatomic drawings (Fig. 1.2). In 1533, Andreas Vesalius, at age 19, was beginning his incredible career as an anatomist. His dissections and recordings of the human anatomy recorded in detailed plates and drawings of patients with cancer failed to note black bile in any cancer, regardless of the organ involved, and provided the basis for dismissing Galen’s theory of the role of black bile in cancer.23

SEVENTEENTH CENTURY Studies of blood began in the 17th century. William Harvey (1578e1657) convincingly described the circulation of blood from the heart through the lungs and back to the heart and then into the arteries and back through

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

4

1. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

FIGURE 1.3 Christopher Wren’s drawing of the brain shows blood vessels discovered by Thomas Willis. From Knoeff R. Book review of soul made flash: discovery of the brain and how it changed the world by C. Zimmer. Nature 2004;427:585.

FIGURE 1.2 Example of anatomic drawing by Leonardo da Vinci. Trunk of female human body, with internal organs seen as though ventral side were transparent. From Da Vinci L. Copyright in Italy by the Institute Geografic DeAgostini S.p.A. e Novara. New York: Reymal & Company; 1956. p. 369 [Wikipedia].

the veins.24 Harvey emphasized that the arteries and veins carried only one substance, the blood, ending Galen’s proposal that veins carried a blend of multiple humors. (Of course, today we know that blood contains multiple cellular and humoral elements, so to some extent Galen was correct.) The famous architect Sir Christopher Wren (1632e1723), originally known as an astronomer and anatomist (Fig. 1.3), in 1656 assembled quills and silver tubes as cannulas and used animal bladders to inject opium into the veins of dogs.25 The first well-documented transfusions of blood were done in animals (dogs) in 1667 by Richard Lower and Edmund King in London26 and were mentioned in Pepys’ diary.27 The first transfusions into humans are attributed to the French physician Jean-Baptiste Denys who in June 1667 transfused sheep blood into a 15year-old boy who survived. James Blundell performed the first modern transfusions in humans in 1818, some of whom survived.28 Transfusions did not become an accepted approach until Landsteiner discovered the major A, B, AB, and O blood groups in 1900 and 1901.29

The 17th century also brought the first vital statistics, which were presented in Graunt’s book Natural and Political Observations Mentioned in a Following Index, and Made Upon the Bills of Mortality.30 In this book of comparative statistics, population and mortality statistics were compared for different countries, ages, and sexes in rural and urban areas. Use of data on mortality among groups would have major importance in future clinical studies.

EIGHTEENTH CENTURY The 18th century brought extraordinary advances in the biological sciences and medicine. At the end of the 17th century, Antony van Leeuwenhoek of Delft (1632e1723) invented the microscope. Although he is best known for using his microscope to provide the first descriptions of protozoa and bacteria, Leeuwenhoek also provided the first description of striated voluntary muscle, the crystalline structure of the lens, red blood cells, and spermatozoa (Figs. 1.4 and 1.5).31 Modern clinical trials can be recognized in the 1700s. Scurvy was a major health problem for the British Navy. William Harvey earlier had recommended lemons to treat scurvy but argued that the therapeutic effect was a result of the acid in the fruit. James Lind (Fig. 1.6), a native of Scotland and a Royal Navy surgeon, conducted a clinical trial in 1747 to assess this hypothesis by

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

5

EIGHTEENTH CENTURY

0*0

PHILOSOPHICAL TRANSACTIONS.

For the Months of Jng*Ji and Stftemttr. Stftemt. 21. 1674, The CONTENTS. llxnfatictl Qtftrtfttutt firna «#r. Leeuwcnboeck, «fotf Blood, Milk, Bones, tkt Brain, Spitle, Coticula, Sweat, Fact, Teares 5 ummtmutei t» t** Letttrs to the PaUfitr. An AtuHit tf* ntttkU Ctfe if* Dropfy, mfoktn fir Grtvibtk* at+yt**g W*mt* ^ imftrttAh* Lttnttd Ptyfoi* MI* Holland. jfa4tt*at,f three E»hr LDE SECRET IQKE JN-IM.ALI Ctgtutt, 4»tb. Gall. Co'e, At D. IF. Ertfmi BtrthiM SELECT A GEO eX£ fR \CA. III. LOG ICJ, fat An CritoUi; f * Gtllit* i» iMnnu* Strmutem vtrfc SUM A*t*u&o*rfait *p* tke Latin Verfon, mult by C. S. of the Pbil.TranUflions*/' J. 1665.1666.1667. l OtftnnttoKfr** e^C Leeowrahoeck, u#tr»*7 Blood, Milk, Bones, /fc Bnio,SpitIe,36^ Cuticula,^. MmmmtiUAk tkt JUt Obfervtr i» tin fMjbtr in * Letttr, lUteljUK r. 1674, Sir,

Y

Ours of 14* of Mrit\a& was my welcome to roe j Wheocelnnd«ftoodwitfj great cooteotfneot, that my Microfcopical Cooxnookatiom badnot been nnaocepoble co yoaaod yoor Philofophkal Frkndr; wUcb hath encouraged R

**

van leeuwenhoek and his little animals. New York, Dover: A Collection of Writings by the Father of Protozoology and Bacteriology; 1960 [Original work published in 1932].

FIGURE 1.5 Title page from Leeuwenhoek’s paper, “Microscopical Observations.” From Dobell C. Antony van leeuwenhoek and his little animals. New York, Dover: A Collection of Writings by the Father of Protozoology and Bacteriology; 1960 [Original work published in 1932].

comparing three therapies for scurvy (Table 1.1).32 Twelve sailors with classic scurvy were divided into six groups of two each, all given identical diets; the various groups were supplemented with vinegar, dilute sulfuric acid, cider, seawater, and a nutmeg, garlic, and horseradish mixture, along with two oranges and one lemon daily. Sulfuric acid, vinegar, seawater, cider, and the physician’s remedy had no benefit. Two sailors receiving citrus fruit avoided scurvy. Although not significant because of sample size, this early clinical study formed the basis for successful avoidance of scurvy with citrus fruit. Studies with sulfuric acid, vinegar, and cider excluded acid as a likely explanation for the beneficial effect of citrus fruit. The 18th century saw great progress in the area of surgery. A remarkable succession of teachers and their students led these studies. Percival Pott of St. Bartholomew’s Hospital described tuberculosis of the spine, or Pott’s disease.33 John Hunter, Pott’s pupil, was the founder of experimental and surgical pathology and was a pioneer in comparative physiology and experimental morphology. Hunter described shock, phlebitis, pyremia, and intussusception and reported major findings of inflammation, gunshot wounds, and surgical

diseases of the vascular system.33 Hunter’s student Edward Jenner (1749e1823)33 introduced vaccination as a tool to prevent infectious diseases (Fig. 1.7).34 Jenner was aware that dairymaids who had contracted cowpox through milking did not get smallpox. In 1798, Jenner conceived of applying this observation on a grand scale to prevent smallpox.35 Jenner was not the first to conceive of the idea of inoculation for smallpox. The Chinese had thought of this earlier, and Sir Hans Sloane had done small studies in 1717 using variolation (inoculating healthy people with pus from blisters obtained from patients with smallpox).36 In 1718, after providing variolation vaccination of her 3-year-old son in Turkey and, 3 years later, her 5-year-old daughter in England, Lady Mary Worley Montagu introduced the Ottoman practice of variolation to the West.36 In addition, James Jurin published several articles between 1723 and 1727 comparing death from natural smallpox in people who had not been inoculated versus those who had been inoculated. Jurin showed that death occurred in 5 of 6 subjects in the first group compared with 1 in 60 in the latter,37 providing one of the first studies using mortality as a critical clinical end point. In 1734, Voltaire wrote, “The Cirassians [a Middle Eastern people] perceived that of a thousand

FIGURE 1.4 Antony van Leeuwenhoek. From Dobell C. Antony

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

6

1. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

FIGURE 1.6

TABLE 1.1

FIGURE 1.7 Edward Jenner (painting by Sir Thomas Lawrence). From Garrison FH. History of medicine. Philadelphia: Saunders; 1917. Reprinted 1963.

James Lind.

Treatment of Scurvy by James Lind

Treatment Arm

Cured

p-valuea

Sulfuric acid

0/2

>0.05

Vinegar

0/2

>0.05

Seawater

0/2

>0.05

Cider

0/2

>0.05

Physicians

0/2

>0.05

Citrus fruit

2/2

>0.05

a

Compared to patients in the five areas of the trial; no placebo group.

persons hardly one was attacked twice by full blown smallpox; that in truth one sees three or four mild cases but never two that are serious and dangerous; that in a word one never truly has that illness twice in life.”38 Thus, Voltaire recognized natural immunity to smallpox, which was an important concept for future vaccinology. In 1721, Cotton Mather demonstrated that variolation protected citizens of the American colonies in Massachusetts,39 and, in 1777, George Washington used variolation against smallpox to inoculate the

Continental Armydthe first massive immunization of a military group.40 In 1774 Benjamin Jesty, a cattle breeder in Dorset, England, inoculated his wife and two sons with cowpox to protect them during an outbreak of smallpox. Jenner, based on his clinical observation that persons who had cowpox were protected from smallpox and his subsequent work showing that people inoculated with cowpox were protected when challenged with smallpox, was the first to try vaccination on a large scale using scabs from cowpox to protect against human smallpox. Jenner was the first to use experimental approaches to establish the scientific basis for vaccination and he transformed a local country tradition into a viable prophylactic principle. Jenner’s vaccine was adopted quickly in Germany and then in Holland, Denmark, the rest of Europe, and the United States. The 1700s were also the time when the first known blinded clinical studies were performed. In 1784 a commission of inquiry was appointed by King Louis XVI of France to investigate medical claims of “animal magnetism” or “mesmerism.” The commission, headed by Benjamin Franklin and consisting of such

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

NINETEENTH CENTURY

distinguished members as Antoine Lavoisier, JeanSylvain Bailly, and Joseph-Ignace Guillotin, had as a goal to assess whether reported effects of this new healing method were due to “real” force or to “illness of the mind.” Among the many tests performed, blindfolded people were told that they were either receiving or not receiving magnetism when in fact, at times, the reverse was happening. Results showed that study subjects felt effects of magnetism only when they were told that they received magnetism and felt no effects when they were not told this, whether or not they were receiving treatment.41 This was the beginning of the use of blinded studies in clinical research. In addition to the first blinded or masked studies, Franklin, for the first time, also pointed out the importance of the placebo effect. The 18th century provided the first legal example that physicians must obtain informed consent from patients before performing a procedure. In an English lawsuit, Slater vs. Baker & Stapleton, two surgeons were found liable for disuniting a partially healed fracture without the patient’s consent.42 This case set an important precedent described by the court: “Indeed it is reasonable that a patient should be told what is about to be done to him that he may take courage and put himself in such a situation as to enable him to undergo the operation.”

7

of the great Justice Holmes, read his article, “On the Contagiousness of Puerperal Fever,”46 to the Boston Society for Medical Improvement (Fig. 1.8). Holmes stated that women in childbed should never be attended by physicians who have been conducting postmortem sections on cases of puerperal fever; that the disease may be conveyed in this manner from patient to patient, even from a case of erysipelas; and that washing the hands in calcium chloride and changing the clothes after leaving a puerperal fever case was likely to be a preventive measure. Holmes’ essay stirred up violent opposition by obstetricians. However, he continued to reiterate his views, and in 1855 in a monograph, Puerperal Fever as a Private Pestilence, Holmes noted that Semmelweis, working in Vienna and Budapest, had lessened the mortality of puerperal fever by disinfecting the hands with chloride of lime and the nail brush.47 Ignaz Philipp Semmelweis (1818e65) performed the most sophisticated preventive clinical trial of the 19th century, which established the importance of hand washing to prevent the spread of infection (Fig. 1.9).48

NINETEENTH CENTURY In the first days of the 19th century, Benjamin Waterhouse, a Harvard professor of medicine, brought Jenner’s vaccine to the United States, and by 1802 the first vaccine institute was established by James Smith in Baltimore, Maryland. In 1813 this led to the establishment of a national vaccine agency by the Congress of the United States under the direction of James Smith.43 Jenner’s vaccination for smallpox was followed by other historic studies in the pathogenesis of infectious diseases. In the mid-1800s, John Snow (1813e58), an anesthesiologist by training, performed the classic studies that determined how cholera was spread in contaminated water. Snow’s studies, which included the first use of statistical mapping, identified contaminated water as the source of cholera. For his work, John Snow is widely considered to be the father of modern epidemiology.44 The French physician Pierre Charles Alexandre Louis (1787e1872) realized that clinical observations on large numbers of patients were essential for meaningful clinical research. He published classical studies on typhoid fever and tuberculosis, and his research in 1835 on the effects of bloodletting demonstrated that the benefits claimed for this popular mode of treatment were unsubstantiated.45 On February 13, 1843, one of Louis’ students, Oliver Wendell Holmes (1809e94), the father

FIGURE 1.8 Oliver Wendell Holmes. From Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 435. Reprinted 1963.

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

8

1. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

FIGURE 1.9 Ignaz Philipp Semmelweis. From Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 436. Reprinted 1963.

Semmelweis, a Hungarian pupil, became an assistant in the first obstetric ward of the Allgemeines Krankenhaus in Vienna in 1846. Semmelweis was troubled by the death rate associated with puerperal or “childbed” fever. From 1841 to 1846, the maternal death rate from puerperal sepsis averaged approximately 10%, and in some periods was as high as 50%, in the First Maternity Division of the Vienna General Hospital. In contrast, the rate was only 2% or 3% in the Second Division, which was attended by midwives rather than physicians. The public knew the disparity, and women feared being assigned to the First Division. Semmelweis became frustrated by this mystery and began to study cadavers of fever victims. In 1847, his friend and fellow physician Jakob Kolletschka died after receiving a small cut on the finger during an autopsy. The risk of minor cuts during autopsies was well known, but Semmelweis made the further observation that Kolletschka’s death was characteristic of death from puerperal fever. He reasoned that puerperal fever was “caused by

conveyance to the pregnant women of putrid particles derived from living organisms, through the agency of the examining fingers.” In particular, he identified cadaveric matter from the autopsy room, with which midwives had no contact, as the source of the infection. In 1847 Semmelweis insisted that all students and physicians scrub their hands with chlorinated lime before entering the maternity ward, and during 1848, the mortality rate on his division dropped from 9.92% to 1.27%. Despite his convincing data, colleagues rejected Semmelweis’ findings and accused him of insubordination. The dominant medical thinking at the time was that high mortality in the charity hospital was related to the poor health of impoverished women, despite the differences between control (no chlorinated lime hand washing) and experimental (washing with chlorinated lime) divisions. Without any opportunity for advancement in Vienna, Semmelweis returned to his home in Budapest and repeated his studies with the same results. In 1861, he finally published The Etiology, Concept, and Prophylaxis of Childhood Fever.48 Although Holmes’ work antedated Semmelweis by 5 years, the superiority of Semmelweis’ observation lies not only in his experimental data but also in his recognition that puerperal fever was a blood poisoning. The observations of Holmes and Semmelweis represent a critical step for medicine and surgery. In addition to the discovery of the importance of hand washing, the first well-documented use of ether for surgery (1846) by William Thomas Green Morton, a Boston dentist, with Dr. John Collins Warren as the surgeon at the Massachusetts General Hospital, occurred during the 19th century.49 The discovery of anesthesia led to the dissociation of pain from surgery, allowing surgeons to perform prolonged operations. Oliver Wendell Holmes is credited with proposing the words anesthetic and anesthesia.49 Recognition of the importance of hand washing and the discovery of anesthetics were essential findings of the 19th century that were critical in the development of modern surgery. In 1865, a Scottish surgeon named Joseph Lister recognized the importance of keeping surgical wounds clean and wrote “.that the decomposition in the injured part might be avoided.by applying as a dressing some material capable of destroying the life of the floating particles.” Based on the observation that carbolic acid cleansed raw storage, Lister began to apply carbolic acid to wounds with great success, establishing the importance of antisepsis in the operating room.50 The work of Holmes and Semmelweis on the importance of hand washing opened the door for Pasteur’s work on the germ basis of infectious diseases. Louis Pasteur (1822e95) was perhaps the most outstanding clinical investigator of the 19th century (Fig. 1.10). He was trained in chemistry. His fundamental work in

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

NINETEENTH CENTURY

9

bacterial culture and identification easy and widely available. Koch cultured the tubercle bacillus and identified the causative agent for anthrax, which was later used by Pasteur to develop a vaccine, and he established Koch’s postulates to prove that an infectious agent causes disease (Fig. 1.11).51 The studies of Pasteur and Koch were performed during the same period as the work of the Norwegian Gerhard Armauer Hansen (1841e1912). In 1874, based on epidemiologic studies in Norway, Hansen concluded that Mycobacterium leprae was the microorganism responsible for leprosy. Hansen’s claim was not well received, and in 1880, in an attempt to prove his point, he inoculated live leprosy bacilli into humans, including nurses and patients, without first obtaining permission. One of the patients brought legal action against Hansen.

FIGURE 1.10 Louis Pasteur. One of the remarkable facts about Pasteur was his triumph over a great physical handicap. In 1868 at age 46, just after completing his studies on wine, he had a cerebral hemorrhage. Although his mind was not affected, he was left with partial paralysis of his left side, which persisted for the remainder of his life. This photograph, taken after he was awarded the Grand Cross of the Legion of Honor in 1881, gives no hint of his infirmity. From Haagensen CD, Lloyd EB. A hundred years of medicine. New York: Sheridan House; 1943. p. 116.

chemistry led to the discovery of levo and dextro isomers. He then studied the ferments of microorganisms, which eventually led him to study the detrimental causes of three major industries in France: wine, silk, and wool. Pasteur discovered the germ basis of fermentation, which formed the basis of the germ theory of disease.51 He discovered Staphylococcus pyogenes as a cause of boils and the role of Streptococcus pyogenes in puerperal septicemia. In other studies, he carried forward Jenner’s work on vaccination and developed approaches to vaccine development using attenuation of a virus for hydrophobia (rabies) and inactivation of a bacterium for anthrax. The work of Pasteur was complemented by the studies of Robert Koch (1843e1910), who made critical technical advances in bacteriology. Koch was the first to use agar as a culture medium, and he introduced the Petri dish, pour plates, and blood agar to make

FIGURE 1.11 Robert Koch. His career in research began in 1872, when his wife gave him a microscope as a birthday present. He was then 28 years old, performing general practice in a small town in Silesia. This was an agricultural region where anthrax was common among sheep and cattle, and it was in the microscopic study of this disease in rabbits that Koch made his first great discovery of the role of anthrax bacilli in disease. From Haagensen CD, Lloyd EB. A hundred years of medicine. New York: Sheridan House; 1943. p. 132.

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

10

1. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

The court, in one of the early cases demonstrating the importance of informed consent in clinical research, removed Hansen from his position as director of Leprosarium No. 1, where the experiments had taken place. However, Hansen retained his position as chief medical officer for leprosy52 and later in his life received worldwide recognition for his life’s work on leprosy. In the same era, Emil von Behring (1854e1917) demonstrated in 1890 that inoculation with attenuated diphtheria toxins in one animal resulted in production of a therapeutic serum factor (antitoxin) that could be delivered to another, thus discovering antibodies and establishing a role for passive immunization. On Christmas Eve of 1891, the first successful clinical use of diphtheria antitoxin occurred.51 By 1894, diphtheria antiserum became commercially available as a result of Paul Ehrlich’s work establishing methods of producing high-titer antisera. Behring’s discovery of antitoxin was the beginning of humoral immunity, and in 1901 Behring received the first Nobel Prize. Koch received the Prize in 1905 (Fig. 1.12). The Russian scientist Elie Metchnikoff (1845e1916) discovered the importance of phagocytosis in hostdefense against infection and emphasized the

FIGURE 1.12 Emil von Behring. From Hirsch JG. Host resistance to infectious diseases: a centennial. In: Gallin JI, Fauci AS, editors. Advances in host defense mechanisms: vol. 1. Phagocytic cells. New York: Raven Press; 1982. p. 7.

importance of cellular components of host defense against infection.53 Paul Ehrlich (1854e1915) discovered the complement system and asserted the importance of the humoral components of host defense. In 1908, Metchnikoff and Ehrlich shared the Nobel Prize (Figs. 1.13 and 1.14). At the end of the 19th century, studies of yellow fever increased awareness of the importance of the informed consent process in clinical research. In 1897, Italian bacteriologist Giuseppe Sanarelli announced that he had discovered the bacillus for yellow fever by injecting the organism into five people. William Osler was present at an 1898 meeting at which the work by Sanarelli was discussed, and Osler said, “To deliberately inject a poison of known high degree of virulency into a human being, unless you obtain that man’s sanction.is criminal.”54 This commentary by Osler had substantial influence on Walter Reed, who demonstrated in human volunteers that the mosquito is the vector for yellow fever. Reed adopted written agreements (contracts) with all his yellow fever subjects. In addition to obtaining signed permission from all his volunteers, Reed

FIGURE 1.13 Elie Metchnikoff in his 40s. From Tauber AI, Chernyak L. Metchnikoff and the origins of immunology. New York: Oxford University Press; 1991, Fig. 5 [Wikipedia].

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

TWENTIETH CENTURY AND BEYOND

11

TWENTIETH CENTURY AND BEYOND

FIGURE 1.14 Paul Ehrlich. From Hirsch JG. Host resistance to infectious diseases: a centennial. In: Gallin JI, Fauci AS, editors. Advances in host defense mechanisms: vol. 1. Phagocytic cells. New York: Raven Press; 1982. p. 9.

made certain that all published reports of yellow fever cases included the phrase “with his full consent.”54 On November 8, 1895, Wilhelm Ro¨ntgen (1845e1923), a German physicist, produced and detected electromagnetic radiation, and on December 22, 1895, he took the first X-ray of his wife’s hand. For this achievement, Ro¨ntgen won the first Nobel Prize in Physics in 1901 (Fig. 1.15A and B). Toward the end of the 19th century, women began to play important roles in clinical research. Marie Curie (1867e1934) and her husband Pierre won the Nobel Prize in Physics in 1903 for their work on spontaneous radiation; in 1911 Marie Curie won a second Nobel Prize (in chemistry) for her studies on the separation of radium and description of its therapeutic properties. Marie Curie and her daughter Irene Joliot-Curie (who won a Nobel Prize in Chemistry in 1935 for her work synthesizing new radioactive elements leading to the discovery of uranium fission) promoted the therapeutic use of radium during World War I (Fig. 1.16).55 Florence Nightingale (1820e1910), in addition to her famous work in nursing, was an accomplished mathematician who applied her mathematical expertise to dramatize the needless deaths caused by unsanitary conditions in hospitals and the need for reform (Fig. 1.17).56

The spectacular advances in medicine that occurred during the 20th century would never have happened without centuries of earlier progress. In the 20th century, medical colleges became well established in Europe and the United States. The great contributions of the United States to medicine in the 20th century are linked to an early commitment to strong medical education. The importance of clinical research as a component of the teaching of medicine was recognized in 1925 by the American medical educator Abraham Flexner, who wrote, “Research can no more be divorced from medical education than can medical education be divorced from research.”57 Two other dominant drivers of progress in medicine through clinical research were government investment in biomedical research and private investment in the pharmaceutical industry. These investments, closely linked with academia, resulted in enhanced translation of basic observations to the bedside. Paul Ehrlich coined the term “chemotherapy” and popularized the concept of a “magic bullet” (chemicals injected into the blood to fight various diseases, particularly those caused by parasites). Ehrlich, in 1910, working with his assistant Sahachiro Hata, developed Salvarsan (arsphenamine), a trivalent arsenic-based chemotherapeutic to cure syphilis. Salvarsan and later Neosalvarsan were commercialized by Hoechst AG as one of the first pharmaceuticals (antibiotics). Sir Alexander Fleming’s discovery of penicillin in 1928 in Scotland spawned expansion of the pharmaceutical industry through the development of antibiotics, antiviral agents, and new vaccines. The Canadian physician Frederick Banting and medical student Charles Best’s discovery of insulin in 1921 was followed by their collaboration with the Canadian chemist James B. Collip who assisted with purification of insulin from cows for use in humans and then the Scottish physiologist J.J.R. Macleod’s confirmatory studies on use of insulin in humans provided lifesaving long-standing management of diabetes for which Banting and Macleod won the Noble Prize in Physiology and Medicine in 1923. The discovery of insulin was followed by the discovery of multiple hormones to save lives. In the 1920s and 1930s, Sir Ronald Aylmer Fisher (1890e1962), from the United Kingdom, introduced the application of statistics and experimental design.58 Fisher worked with farming and plant fertility to introduce the concepts of randomization and analysis of variancedprocedures used today throughout the world. In 1930, Torald Sollmann emphasized the importance to a study of controlled experiments with placebo and blind limbsda rebirth of the “blinded” or “masked”

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

12

1. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

FIGURE 1.15

FIGURE 1.16

(A) Wilhelm Conrad Ro¨ntgen. (B) Print of Wilhelm Ro¨ntgen’s first X-ray of his wife’s hand.

Marie Curie (1867e1934).

FIGURE 1.17 Florence Nightingale (1820e1910).

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

TWENTIETH CENTURY AND BEYOND

studies originated by Benjamin Franklin in 1784. Sollmann wrote, “Apparent results must be checked by the ‘blind test,’ i.e., another remedy or a placebo, without the knowledge of the observer, if possible” (Fig. 1.18). He said “Observations without adequate controls and checks are practically useless.”59 Through these approaches, many new drugs for treatment of hypertension, cardiovascular disease, manic depression, and epilepsy, to name a few, were developed. The spectacular advances of the 20th century were associated with troubling events in clinical research that heightened public attention and formalized the field of clinical bioethics. Nazi human experimentation led to the Nuremberg Code in 1947, which was designed to protect human subjects by ensuring voluntary consent of the human subject and by asserting that the anticipated result of research must justify its performance. The Tuskegee syphilis experiments initiated in the 1930s and continued until 1972 in African American men and the Willowbrook hepatitis studies of the mid-1950s in children with Down syndrome highlighted the need to establish strict rules to protect research patients. In 1953 the US National Institutes of Health (NIH) issued “Guiding Principles in Medical Research Involving Humans,” which required prior review by a medical committee of all human research to be conducted at the newly opened NIH Clinical Center. In 1962, the Kefauver-Harris amendment to the 1938 US Federal Food, Drug, and Cosmetics Act stipulated that subjects must be told whether a drug is being used for investigational purposes and that subject consent must be obtained. In 1964, the World Medical Assembly adopted

FIGURE 1.18 Testing puddings and gelatins at Consumers Union. Copyright 1945 by Consumers Union of U.S., Inc., Yonkers, NY. Reprinted with permission from the April 1945 issue of Consumer Reports.

13

the Declaration of Helsinki, stressing the importance of assessing risks and determining that risks are outweighed by potential benefits of research. In 1966, Henry Beecher pointed out major ethical issues in clinical research.60 During the same year, the US Surgeon General issued a memo to the heads of institutions conducting research with Public Health Service grants requiring prior review of all clinical research. The purposes of this review were to ensure protection of research subjects, to assess the appropriateness of methods employed, to obtain informed consent, and to review risks and benefits of the research; thus institutional review boards were established. In 1967, the U.S. Food and Drug Administration added the requirement that all new drug sponsors must obtain informed consent for use of investigational drugs in humans. Over the past 50 years, clinical research has become big business. The pharmaceutical and biotechnology industries have engaged university-based clinical investigators in the business of clinical research. For example, in the United States interaction between federal investigators and industry, encouraged by the US Congress when it passed the Federal Technology Transfer Act in 1986, has successfully increased the translation of basic research to the bedside by US government scientists. At the same time, however, the relationship between industry and academia has grown closer, and new ethical, legal, and social issues have evolved worldwide. Clinical investigators have become increasingly associated with real and perceived conflicts. Examples of these issues include promoting an investigator’s financial or career goals while protecting the patient, protecting “unborn children” while pursuing the potential use of embryonic stem cells to rebuild damaged organs, and protecting patient confidentiality as a result of gene sequencing. As a consequence of these issues, the public has engaged in debate about the well-being of current and future generations of patients who volunteer to partner with a clinical investigator on protocols. The 20th century saw incredible advances in genomics, including the Nobel prizes to Watson and Crick for the description of the double helix model of DNA61 and to Barbara McClintock for her work in the 1940s for studies of corn indicating an organism’s genome is not a stationary entity, but rather is subject to alteration and rearrangement through transposable elements or jumping genes.62 In the 1970s Janet Rowley discovered that translocation between chromosome 8 and 21 caused acute myelogenous leukemia and between chromosome 15 and 17 caused promyelocytic leukemia.63 These and other genomic and molecular discoveries have provided opportunities for conducting clinical research in the 21st century that are greater than ever. A new urgency to move clinical research findings from the laboratory to the patient and into the community has prioritized

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

14

1. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

translational research globally. Today, understanding and meeting public concerns is as important for the clinical investigator as performing the clinical study. Principles for conducting clinical research have evolved from centuries of experience. As the science moves forward, ethical, legal, and social issues pose special challenges for the clinical investigator. These challenges are the focus of the following chapters of this book.

SUMMARY QUESTIONS 1. The definition of clinical research embraces a continuum of studies in which of these following categories? (More than one item can be selected.) a. Behavioral research b. Health services research c. Epidemiology d. Disease mechanisms e. Translational research f. Diagnosis and natural history of disease g. Therapeutic interventions including clinical trials h. Prevention and health promotion i. Community-based and managed care-based research j. All of the above 2. True or False: Although early examples of clinical research predate the Greeks, Galen (AD 129 216) is considered the father of modern medicine, and he exhibited the strict discipline required of a clinical investigator. 3. Ignaz Semmelweis performed the most sophisticated preventive clinical trial of the 19th century, which established the importance of hand washing. Circle all of the following statements related to Semmelweis’ work that are true: (More than one item can be selected.) a. Semmelweis started his career as a student on an obstetric ward b. The death rate from puerperal sepsis reached 90% in select maternity divisions c. The second division used midwives and the death rate was only 2%e3% d. Semmelweis started his work by studying cadavers e. Semmelweis introduced hand washing with chlorinated lime to decrease mortality rates f. Despite convincing data, Semmelweis’ work was condemned by colleagues 4. The first blinded clinical study was done by which of the following? a. Hippocrates b. Galen c. James Lind d. Benjamin Franklin e. Louis Pasteur

References 1. Association of American Medical Colleges Task Force on Clinical Research 2000, vol. 1. Washington, DC: Association of American Medical Colleges; 1999. p. 3. 2. Thorwald J. Science and secrets of early medicine. New York: Harcourt, Brace and World; 1962. 3. Mukherjee S. The emperor of all maladies. New York: Scribner; 2010. p. 40. 4. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 73e4. Reprinted 1963. 5. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 70. Reprinted 1963. 6. Lane B. In: Burgland L, editor. Reading the bible with understanding. St. Louis, MO: Concordia: How We Got the Bible; 1999. 7. Saraf S, Parihar RS. Sushruta: the first plastic surgeon in 600 B.C. Intern J Plast Surg 2007;4(2). 8. Adams F. The genuine works of Hippocrates. New York: William Wood; 1886. 9. Lyons AS, Petrucelli RJ. Medicine, an illustrated history. New York: Abradale Press; 1987. p. 216. 10. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 98. Reprinted 1963. 11. Mukherjee S. The emperor of all maladies. New York: Scribner; 2010. p. 47. 12. Logic Nutton V. Learning, and experimental medicine. Science 2002;295:800e1. 13. Mukherjee S. The emperor of all maladies. New York: Scribner; 2010. p. 48. 14. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 176. Reprinted 1963. 15. Ligon BL. Rhazes: his career and his writings. Semin Pediatr Infect Dis 2001;12(3):266e72. 16. Hehmeyer I, Khan A. Islam’s forgotten contributions to medical science. Can Med Assoc J 2007;176(10):1467e8. 17. Ahmad Z. Al-Zahrawidthe father of surgery. ANZ J Surg 2007; 77(Suppl. 1):A83. 18. Finger S. Origins of neuroscience: a history of explorations into brain function. New York: Oxford University Press; 1994. p. 70. 19. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 180. Reprinted 1963. 20. Sajadi M, et al. Ibn Sina and the clinical trial. Ann Intern Med 2009; 150:640e3. 21. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 166e7. Reprinted 1963. 22. Da Vinci L. Copyright in Italy by the Istituto Geografico De Agostini S.p.A. e Novara. New York: Reymal & Company; 1956. 23. Mukherjee S. The emperor of all maladies. New York: Scribner; 2010. p. 53. 24. Wintrobe MM. Blood, pure and eloquent. New York: McGraw-Hill; 1980. 25. Wintrobe MM. Blood, pure and eloquent. New York: McGraw-Hill; 1980. p. 661e2. 26. Wintrobe MM. Blood, pure and eloquent. New York: McGraw-Hill; 1980. p. 663. 27. Nicolson MH. Pepys’ diary and the new science. Charlottesville: University Press of Virginia; 1965. p. 663. Quoted in reference 13. 28. Blundell J. Observations on the transfusion of blood. Lancet 1828; 2(2):321. 29. Landsteiner K. On the individual differences in human blood. In: Nobel lectures, physiology or medicine 1922e1941. Amsterdam: Elsevier Publishing Company; 1965 30. Graunt J. Natural and political observations mentioned in a following Index, and made upon the Bills of mortality. London, 1662. Reprinted by Johns Hopkins press, Baltimore. 1939 Quoted in Lilienfeld AM. Centeris Paribus: the evolution of the clinical trial. Bull Hist Med 1982;56:1e18.

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

REFERENCES

31. Dobell C. Antony van Leeuwenhoek and his little animals. New York, Dover: A Collection of Writings by the Father of Protozoology and Bacteriology; 1960 [Original work published in 1932]. 32. Lind J. A treatise of the scurvy. Edinburgh, UK, sands, Murray and Cochran, 1753, p. 191e193. Quoted in Lilienfeld AM. Centeris Paribus: the evolution of the clinical trial. Bull Hist Med 1982;56:1e18. 33. Haagensen CD, Lloyd EB. A hundred years of medicine. New York: Sheridan House; 1943. 34. Wood GB. Practice of medicine. Philadelphia: Collins; 1849. 35. Jenner E. An inquiry into the causes and effects of the variolae vaccinae. London: Sampson Low; 1798. 36. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 373. Reprinted 1963. 37. Miller G. The adoption of inoculation for smallpox in England and France. Philadelphia: university of Pennsylvania press; 1957, 114e118. Quoted in Lilienfeld AM. Centers Paribus: the evolution of the clinical trial. Bull Hist Med 1982;56:1e18. 38. Plotkin SA. Vaccines: past, present and future. Nat Med 2005;11: S5e11. 39. Harper DP. Angelical conjunction: religion, reason, and inoculation in Boston, 1721e1722. The Pharos Winter 2000:1e5. 40. Fenn EA. Pox Americana. The great small pox epidemic of 1775e82. New York: Hill and Wang; 2001. 41. Franklin B. Animal and other commissioners charged by the king of France. Animal magnetism. 1784. An historical outline of the “Science” made by the committee of the Royal Academy of Medicine in Philadelphia translated from the French. Philadelphia: H. Perkins; 1837. 42. Slater vs. Baker & Stapleton (1767) 95, Eng. Rep. 860. Quoted in Appelbaum PS. In: Lidz CW, Meisel A, editors. Informed consent. Legal theory and clinical practice. New York: Oxford University Press; 1987. 43. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 375. Reprinted 1963. 44. Hempel S. The strange case of the broad street pump: John Snow and the Mystery of Cholera. Berkley: University of California Press; 2007. 45. Morabia APCA. Louis and the birth of clinical epidemiology. J Clin Epidemiol 1996;49:1327e33.

15

46. Holmes OW. On the contagiousness of puerperal fever. N Engl J Med 1842e1843;1:503e30. Quoted in reference 3, 435. 47. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 435. Reprinted 1963. 48. Semmelweiss IP. Die Aetiologie, der Begriff und die Prophylaxis des Kindbettfiebers. C.A. Hartleben: Budapest and Vienna; 1861. p. 436. Quoted in reference 3. 49. Garrison FH. History of medicine. Philadelphia: Saunders; 1917. p. 506. Reprinted 1963. 50. Lister J. Classics in infections diseases. On the antiseptic principle of the practice of surgery. Rev Infect Dis 1987;9(2):421e6. 51. Hirsch JG. Host resistance to infectious diseases: a centennial. In: Gallin JI, Fauci AS, editors. Advances in host defense mechanisms: vol. 1. Phagocytic cells. New York: Raven Press; 1982. 52. Bendiner E. Gerhard Hansen: hunter of the leprosy bacillus. Hosp Pract December 15, 1989:145e70. 53. Tauber AI, Chernyak L. Metchnikoff and the origins of immunology. New York: Oxford University Press; 1991. 54. Lederer SE. Human experimentation in America before the Second World War. Baltimore: Johns Hopkins University Press; 1995. 55. Macklis RM. Scientist, technologist, proto-feminist, superstar. Science 2002;295:1647e8. 56. Cohen IB. Florence Nightingale. Sci Am 1984;250:128e37. 57. Flexner A. Medical education. A comparative study. New York: Macmillan; 1925. 58. Efron B. Fisher in the 21st century. Stat Sci 1998;13:95e122. 59. Sollmann T. The evaluation of therapeutic remedies in the hospital. J Am Med Assoc 1936;94:1280e300. 60. Beecher HK. Ethics and clinical research. N Engl J Med 1966;274: 1354e60. 61. Watson J, Crick F. Molecular structure of nucleic acids. A structure of deoxyribonucleic acid. Nature 1953;171:737e8. 62. Pray L, Zhaurovak K. Barbara McClintock and the discovery of jumping genes (transposons). Nat Education 2008;1:169. 63. Drucker BJ. Janet Rowley (1925e2013). Geneticist who discovered that broken chromosomes cause cancer. Nature 2014;505:784.

I. A HISTORICAL PERSPECTIVE ON CLINICAL RESEARCH

P A R T I

ETHICAL, REGULATORY AND LEGAL ISSUES

C H A P T E R

2 Ethical Principles in Clinical Research Christine Grady National Institutes of Health, Bethesda, MD, United States

O U T L I N E Distinguishing Clinical Research From Clinical Practice

19

Ethics and Clinical Research

20

History of Ethical Attention to Clinical Research Benefit to the Individual Benefit to Society Protection of Research Subjects Research as a Benefit Community Involvement in Research

20 20 20 21 21 21

Codes of Research Ethics and Regulations

Value and Validity Fair Subject Selection Favorable Risk/Benefit Ratio Independent Review Informed Consent Respect for Enrolled Subjects

23 24 25 25 26 27

Ethical Considerations in Randomized Controlled Trials

27

22

Conclusion

29

Research on Bioethical Questions

23

Summary Questions

29

Ethical Framework for Clinical Research

23

References

30

Clinical research has resulted in significant benefits for society, yet continues to pose profound ethical questions. This chapter briefly describes: five overlapping but distinct eras reflecting the history of clinical research ethics; codes of research ethics; and seven ethical principles that guide clinical research ethics and particular ethical challenges in randomized controlled trials (RCTs).

hypotheses and permits generalizable conclusions useful in understanding human health and illness, improving medical care or the public health, and developing safe and effective interventions to prevent, diagnose, and treat disease. As such, research serves the common or collective good; the individual subject participating in clinical research may or may not benefit from participation. Clinical research is distinct from clinical practice in that each has different, yet not mutually exclusive, purposes, goals, and methods.1 Clinical practice involves diagnosis, prevention, treatment, and care for a particular individual or group of individuals with the goal of meeting the health needs of and benefiting that individual(s). Clinical practice is based on evidence or experience, is designed to enhance the patient’s well-being, and has a reasonable expectation of success. Usual methods in clinical practice are evidence-based and

DISTINGUISHING CLINICAL RESEARCH FROM CLINICAL PRACTICE Clinical research involves the study of human beings in a systematic investigation of health and illness, designed to develop or contribute to generalizable knowledge. The goal of clinical research is to gather knowledge through a set of activities that tests

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00002-2

19

Copyright © 2018. Published by Elsevier Inc.

20

2. ETHICAL PRINCIPLES

guided by standard practice and experience. The risks of interventions or procedures employed in clinical practice are justified by the prospect of therapeutic benefit to the individual. In contrast, clinical research aims to generate useful knowledge and is not designed to meet the health needs of, nor necessarily to benefit, individual patient participants. Although an individual may receive quality patient care and treatment when participating in research, this is not the goal of research, and much research does not directly benefit individual participants. Further, frequently used research methodologies, such as randomization, blinding, dose escalation, placebo controls, and others are rarely found and might be considered unacceptable, in clinical practice. In clinical research, some risk is justified by the importance of the knowledge to be gained rather than benefit to the individual participant.

ETHICS AND CLINICAL RESEARCH Two fundamental ethical questions regarding clinical research are important to consider: (1) why should we do research with human beings and (2) how should it ethically be done? Two competing considerations frame these questions: clinical research is valuable in generating practical knowledge useful for advancing or improving medical care and health, yet respect for the rights, welfare, dignity, and freedom of choice of individual humans is indispensable. Research with human beings is essential to advancing or improving medical care and/or the public health and providing health professionals with the knowledge and evidence necessary to appropriately and safely care for patients. The pursuit of knowledge through research should be rigorous to inform effective and safe clinical practice, and progress would not be possible without rigorous clinical research. Conducting clinical research designed to enhance the understanding of human health and illness may be more than a social good; arguably it is a social imperative.2 Although progress in medical care and health is a societal good, some contend it is an optional good,3 and that other considerations, such as the primacy of the individual, should take precedence. Whether improvement in medical care or health through clinical research is an option or an imperative, limits are necessary. Human research participants are the means to securing practical knowledge, but because people should not be treated “merely [as] means to an end, but always as ends in themselves,”4 the need to respect and protect human research participants is paramount. The primary ethical tension in clinical research, therefore, is that a few individuals are asked to accept some research burden, risk, or inconvenience to benefit

others, including future persons and society. Ethical requirements aim to minimize the possibility of exploiting research participants by ensuring that they are treated with respect while contributing to the generation of knowledge, and their rights and welfare are protected throughout the process of research.

HISTORY OF ETHICAL ATTENTION TO CLINICAL RESEARCH Throughout history, perception and acceptance of the methods, goals, and scope of clinical research have evolved significantly, as have attention to and appreciation of what respecting and protecting research participants entails. A brief detour through the history of clinical research illustrates these changing perspectives.5

Benefit to the Individual Historically and for hundreds of years, there was little basis for a distinction between experimentation and therapy because most therapy was experimental, and systematic evidence of the effectiveness of medical interventions was rare. Experimental therapies were used in the hopes of benefiting ill patients, but such “therapy” frequently contributed to or caused morbidity or mortality. Systematic research was sporadic. Most researchers were medical practitioners, motivated to do what they thought best for their patients, and trusted to do the right thing. Fraud and abuse were minimized to some extent through peer censorship because no specific codes of ethics, laws, or regulations governed the conduct of research. Early regulations, such as the Pure Food and Drug Act of 1906 in the United States, prohibited unsubstantiated claims on medicine labels. Yet, research began to grow as an enterprise only after the development of early antibiotics like penicillin and the passage of the Food, Drug, and Cosmetic Act in 1938, which required evidence of safety before a product was marketed.6

Benefit to Society Around the time of World War II, there was a dramatic shift in clinical research with tremendous growth in the research enterprise. Pharmaceutical companies were established; large amounts of both public and private money were devoted to research; and research became increasingly centralized, coordinated, standardized in method, and valued. Human subjects research entered what has since been described as an “unashamedly utilitarian phase.”7 Individuals often were included in research because they were

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

HISTORY OF ETHICAL ATTENTION TO CLINICAL RESEARCH

available, captive, and marginalized, and they were seen as making a contribution to society. The federal government and the pharmaceutical industry supported intensive research efforts to develop vaccines and antibiotics for infectious diseases to help soldiers, as infectious diseases were a significant problem for the armed services. During this era, research was commonly conducted in prisons, orphanages, and homes for the emotionally or developmentally disturbed, as well as with other institutionalized groups. The distinction between research and therapy was fairly clear; subjects not necessarily in need of therapy were accepting a personal burden to make a contribution to society. A utilitarian justification served as the basis of claims that some individuals could be used for the greater common good. Revelations of Nazi medical experiments and war crimes, and the Nuremberg trial of Nazi doctors, raised public and professional concerns about the justification and scope of research with human subjects.8

Protection of Research Subjects In the late 1960s and early 1970s in the United States, shock and horror at stories of abuse of human subjects led to intense scientific and public scrutiny and reflection, and debate about the scope and limitations of research involving human subjects. A renowned Harvard anesthesiologist, Henry Beecher, published a landmark article in the New England Journal of Medicine in 19669 highlighting ethical problems in 22 research studies conducted in reputable US institutions. Exposition of studies such as the hepatitis B studies at Willowbrook, the U.S. Public Health Service Tuskegee syphilis studies, and others generated intense public attention and concern. Congressional hearings and action led to passage of the 1974 National Research Act (PL 93e348) and establishment of the US National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research.10 This extremely influential body authored multiple reports and recommendations about clinical research, including reports on research with children and on institutional review boards (IRBs). Included in its legacy is the Belmont Report, in which ethical principles underlying the conduct of human subject research and their application are explained.11 The Commission’s work emphasized the need to protect individuals participating in research from potential exploitation and harm, and provided the basis for subsequent federal regulations codified in 1981 in Title 45, US Code of Federal Regulations (USCFR), Part 46 (45CFR46), titled “Protection of Human Subjects,” and similar FDA regulations (21 CFR.50 and 56). In 1991, the Department of Health and Human

21

Services (DHHS) regulations became the currently operative Common Rule,12 which governs the conduct of human subjects research funded by 17 US federal agencies. The major thrust of these federal regulations and of many existing codes of research ethics continues to be protection of subjects from the burdens and harms of research.

Research as a Benefit Events in the late 1980s and 1990s altered some public perspectives on clinical research. Certain articulate and vocal activists asserted that research participation can offer an advantage that individuals want access to, rather than simply harm to be protected from.13 According to this perspective, as espoused by human immunodeficiency virus (HIV) and breast cancer activists and others, participation in research is a benefit, protectionism is discrimination, and exclusion from research can be unjust. Empirical studies have demonstrated that oncology patients, for example, who participate in clinical trials benefit through improved survival.14,15 Activism and changes in public attitudes about research led to substantive changes in the way research is done and drugs are approved. In addition to the possible benefits of participation for individuals, it was claimed that certain traditionally underrepresented groups were being denied the benefits of the application of knowledge gained through research.16 Since 1994, the US National Institutes of Health (NIH) has required that those who receive research funding must include previously underrepresented women and ethnic minorities.17 Since 1998, NIH guidelines have required the inclusion of children in research or justification for their exclusion.18

Community Involvement in Research In subsequent years, the growth of genetics research, research with stored biospecimens and data, and international collaborative research, in particular, have highlighted the value of greater public and community involvement in research. Clinical research does not occur in a vacuum but is a collaborative social activity that requires the support and investment of involved communities; and it also comes with inherent risks and potential benefits for communities and groups. As such, involvement of the community (1) in helping to set research priorities, (2) in planning and approving research, (3) in evaluating risks and benefits during and after a trial, and (4) in influencing particular aspects of recruitment, informed consent, and the realization of community benefits demonstrates respect for the community and can facilitate successful research.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

22

2. ETHICAL PRINCIPLES

CODES OF RESEARCH ETHICS AND REGULATIONS Throughout this history, several influential documents have helped to shape our sense of the contours of ethical research (Table 2.1). Most were written in response to specific crises or historical events, yet all have accepted an underlying assumption that research as a means to progress in medical care or health is a social good. The Nuremberg Code, a 10-point code on the ethics of human experimentation, was written as the concluding part of the judgment at the Nuremberg Trials (1949).19 Established in response to Nazi experimentation, the Nuremberg Code recognized the potential value of research knowledge to society but emphasized the absolute necessity of voluntary consent of the subject. The Nuremberg Code established that ethical research must prioritize the rights and welfare of the subject. Most subsequent codes and guidelines for the ethical conduct of research have maintained this emphasis and all have incorporated requirements for informed consent. The Declaration of Helsinki was developed by the World Medical Assembly (WMA) in 1964 as a guide to the world’s physicians involved in human subject research.20 The Declaration of Helsinki recognizes that some, but not all, medical research is combined with clinical care and emphasizes that patients’ participation in research should not put them at a disadvantage with respect to medical care. The Declaration of Helsinki also recognizes legitimate research with people who cannot give their own informed consent, such as children and the cognitively impaired, but for whom informed permission could be obtained from a legal guardian. The Declaration of Helsinki has had considerable influence on the formulation of international, regional, and national legislation and regulations governing clinical research. The Declaration of Helsinki has been revised multiple times by the WMA (1975, 1983, 1989, 1996, 2000, 2008, and 2013) TABLE 2.1

Selected Codes and US Regulations Guiding Clinical Research

• The Nuremberg Code (1949) • The World Medical Association Declaration of Helsinki (1964, 1975, 1983, 1989, 1996, 2000, 2008, and 2013) • The National Commission’s Belmont Report (1979) • CIOMS International Ethical Guidelines for Biomedical Research Involving Human Subjects (1982, 2002, 2015) • International Conference on Harmonization Guidelines for Good Clinical Practice (1996) • Title 45, USCFR, Part 46, “The Common Rule” • Title 21, USCFR, Part 50 (“Protection of Human subjects”) and 56 (“Institutional Review Boards”)

and is considered a living document. Certain provisions of the Helsinki Declaration, such as posttrial obligations and the use of placebo controls, have been topics of continued debate among international researchers. The Belmont Report, published by the US National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, describes three broad ethical principles that guide the conduct of research and form the “basis on which specific rules could be formulated, criticized, and interpreted.”11 These three principles are respect for persons, beneficence, and justice. Respect for persons requires respect for the autonomous decision-making of capable individuals as applied in the process of informed consent and also calls for protection of those with diminished autonomy. Beneficence requires protecting individuals from deliberate and unnecessary harm, as well as maximizing benefits and minimizing harms, and is applied to clinical research through careful risk/benefit evaluation. Justice demands a fair distribution of the benefits and burdens of research and is applied in the Belmont Report to fairness in the processes and outcomes of selecting research subjects. In 1982, the Council of International Organizations of Medical Sciences (CIOMS), in conjunction with the World Health Organization (WHO), issued International Ethical Guidelines for Biomedical Research Involving Human Subjects, which were revised in 1993, 2002, and 2015.21 The CIOMS guidelines acknowledge that background circumstances sometimes differ between low- and middle-income and high-income countries, and there may be differences in the primacy of focus on the individual and individual rights. CIOMS set out to apply the Helsinki principles to the “special circumstances of many technologically developing countries.” CIOMS adopted the three ethical principles spelled out in the US National Commission’s Belmont Report and maintains most of the tenets of Nuremberg and Helsinki but has provided additional and valuable guidance and commentary on externally sponsored research and research with vulnerable populations. The 2015 revision restructures and expands many previously existing guidelines and adds new guidelines on compensation for research-related injury, research with stored biospecimens and data, and implementation science, among others.21 Federal regulations found in Title 45, USCFR, Part 46 (45CFR46),12 were promulgated in 1981 for research funded by DHHS (formerly the Department of Health, Education, and Welfare), and at Title 21 USCFR, Part 50 and 56 for the U.S. Food and Drug Administration (FDA).22 FDA regulations are similar, but not identical, to those found in the Common Rule.23 Compliance with these and other FDA regulations is required for

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

ETHICAL FRAMEWORK FOR CLINICAL RESEARCH

research investigating FDA-regulated products, such as drugs, biologics, and medical devices. DHHS regulations were extended in 1991 as the Federal Common Rule, applicable to research funded by 17 US federal agencies (not including the FDA). Based on recommendations of the National Commission, the Common Rule stipulates both the membership and the function of IRBs, and the criteria that an IRB should apply when reviewing a research protocol to determine whether to approve it. The Common Rule also delineates the information that should be included in an informed consent document, how consent should be documented, and criteria for waiver or alteration of informed consent. Subparts B, C, and D of 45CFR46 describe additional protections for DHHS-funded research with fetuses and pregnant women, prisoners, and children, respectively. In 2017, a final revision to the Common Rule was published in the Federal Register, with the most extensive changes to the Common Rule since 1991.24 The International Conference on Harmonization (ICH) sought to harmonize regulatory guidelines for product registration trials for the United States, the European Union, and Japan. The ICH Good Clinical Practice (GCP) (E-6) Guidelines provide widely accepted guidance promoting the ethical conduct of research and reporting of accurate and reliable data.25 The World Health Organization produced good clinical research guidelines that incorporated ICH-GCP and also included types of clinical research beyond drugregistration trials.26 Good clinical practice guidelines are being adopted by countries around the world to guide the conduct of research.

RESEARCH ON BIOETHICAL QUESTIONS The historical evolution of clinical research ethics and the development of guidelines and regulations was largely in response to particular events or scandals. The Nuremberg Code, for example, was a response to atrocities performed by Nazi research doctors during World War II; the formation of the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research was in response to revelations of the U.S. Public Health Service syphilis studies in Tuskegee. Our systems for protection of human subjects, the focus of the ethics of clinical research, and the existing regulations grew out of these efforts. Another essential way to inform our thinking about the ethics of clinical research, and one that has gained traction in recent decades, is research on bioethical questions. Bioethics research is usually conducted using one or more of the following methodologies: historical inquiry, conceptual analysis, empirical studies, or policy analysis.27 For

23

example, bioethics research on voluntariness, an essential part of informed consent, could better our understanding of what voluntariness means and how to maximize it in the process of informed consent. Such research might include an analysis of the concept. Recognizing that all decisions and actions can be influenced by one’s understanding, previous experiences, religion and culture, and the influences of respected others, distinguishing what makes a choice sufficiently voluntary from a choice that is controlled is important. Conceptual bioethics research also might examine the concepts of coercion, undue influence, and manipulation, which are different possible controlling influences.28 Empirical research might seek to elucidate how people actually choose research participation, what sources of influence and pressure they identify, whether they perceive they could say no to participation and under what circumstances, experiences of manipulation or undue influence, and other phenomena. Requirements for rigorous and ethical research on topics in bioethics are similar to those for ethical clinical research.

ETHICAL FRAMEWORK FOR CLINICAL RESEARCH A systematic framework of principles for ethical clinical research was derived from guidance provided in various ethical codes, guidelines, literature, and bioethics research. This proposed framework of principles is meant to apply sequentially and universally to clinical research.29 According to this framework, ethical clinical research should satisfy the following requirements: social or scientific value, scientific validity, fair subject selection, favorable risk/benefit ratio, independent review, informed consent, and respect for enrolled subjects30 (Table 2.2). Each will be described briefly.

Value and Validity The first requirement of ethical research is that the research question must be worth askingdthat is, have potential social, scientific, or clinical value. The anticipated usefulness of knowledge to be gained in understanding or improving health or health care is the crux of determining valuednot whether study results are positive or negative. A study should have sufficient social value to justify asking individuals to assume risk or inconvenience in research and to justify the expenditure of resources.31 A valuable research question then ethically requires validity and rigor in research design and implementation to produce valid, reliable, interpretable, and generalizable results. Poorly designed researchdfor

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

24 TABLE 2.2

2. ETHICAL PRINCIPLES

Ethical Framework for Clinical Research

Principles of Ethical Clinical Research

Description

Value

Research poses a clinically, scientifically, or socially valuable question that will contribute to generalizable knowledge about health or will be useful in improving health. Research is responsive to health needs and priorities.

Validity

The study has an appropriate and feasible design and end points, rigorous methods, and a feasible strategy to ensure valid and interpretable data.

Fair subject selection

The process and outcomes of subject and site selection are fair and are based on scientific appropriateness, minimization of vulnerability and risk, and maximization of benefit.

Favorable risk/benefit ratio

Study risks are justified by potential benefits and the value of the knowledge. Risks are minimized and benefits are enhanced to the extent possible.

Independent review

Independent evaluation of adherence to ethical guidelines in the design, conduct, and analysis of research.

Informed consent

Clear processes for providing adequate information to and promoting the voluntary enrollment of subjects.

Respect for enrolled participants

Study shows respects for the rights and welfare of participants both during and at the conclusion of research.

example, with an inappropriate design, inadequate power, insufficient or sloppy data, or inappropriate or unfeasible methodsdis harmful because human and material resources are wasted and individuals are exposed to risk for no benefit.30

Fair Subject Selection Fair subject selection requires that subjects be chosen for participation in clinical research based first on the scientific question, balanced by considerations of risk, benefit, and vulnerability. As described in the Belmont Report, fairness in both the processes and the outcomes of subject selection prevents exploitation of vulnerable individuals and populations and promotes equitable distribution of research burdens and benefits. Fair

procedures means that investigators should identify groups or individuals who would be appropriate for scientific reasonsdthat is, for reasons related to the problem being studied and justified by the design and the particular questions being askeddnot because of their easy availability or manipulability, or because subjects are favored or disfavored.11 Extra care should be taken to justify the inclusion of vulnerable subjects, as well as to justify excluding those who stand to benefit from participation. Exclusion without adequate justification can be unfair; therefore, eligibility criteria should be as broad as possible, consistent with the scientific objectives and the anticipated risks of the research. Distributive justice is concerned with a fair distribution of benefits and burdens, thus expected benefit and burden in a particular study is an important consideration for subject selection. Scientifically appropriate individuals or groups may be fairly selected consistent with attention to equitably distributing benefits and burdens, as well as minimizing risks and maximizing benefits. Persons are considered vulnerable when their ability to protect or promote their own interests is compromised, often because of an impaired capacity to provide informed consent. Although disagreement remains about the meaning of vulnerability in research and who is actually vulnerable,32 there is support for the idea that among scientifically appropriate subjects, the less vulnerable should be selected first. For example, an early drug safety study should be conducted with adults before children, and with consenting adults before including those who cannot consent. Certain groups, such as pregnant women, fetuses, prisoners, and children, are further protected by specific regulations requiring additional safeguards in research. According to US regulations, determination of the permissibility of research with children depends on the level of research risk and the anticipated benefits. Accordingly, (1) research that poses minimal risk to children is acceptable, (2) research with more than minimal risk must be counterbalanced by a prospect of direct therapeutic benefit for the children in the study, (3) for research with small amounts of additional risk (minor increment over minimal), but without the prospect of direct therapeutic benefit for the children can sometimes be justified by the importance of the question for children with the disorder under study, or (4) research without a prospect of benefit that poses greater than minimal risk to participating children can only be conducted if approved by a special panel convened by the US Secretary of the DHHS.33 Enrolling children in research also requires permission from their parents or legal guardians, along with the child’s assent whenever possible. Fair subject selection also requires considering the outcomes of subject selection. For example, if women, minorities, or children are not included in studies of a

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

ETHICAL FRAMEWORK FOR CLINICAL RESEARCH

particular intervention, then study results may be difficult to apply to these groups in practice, and interventions could actually be harmful. Therefore, study populations recruited for research should be representative of the populations likely to use the strategies tested in the research.34 Similarly, it has been argued that justice requires subjects to be among the beneficiaries of research. This means that subjects should be selected as participants in research from which they or others like them can benefit and should not be asked to bear the burdens of research from which they can reap no benefits. This understanding of justice has raised important and challenging questions in the conduct of collaborative international research. Some have argued that if an experimental drug or vaccine is found effective in a certain tested population, there should be prior assurance that population will have access to the drug or vaccine.35 Alternatively, subjects or communities should be assured of and involved in negotiation about fair benefits derived from research that are not necessarily limited to the benefits of available products of research.36

Favorable Risk/Benefit Ratio The ratio of risks to benefits in research is favorable when risks are justified by benefits to participants or to society, and when research is designed in a way that minimizes risk and enhances benefit for participating subjects. The ethical principle of beneficence obliges that we (1) protect people from deliberate or unnecessary harm and (2) maximize possible benefits and minimize possible harms. A widely accepted principle states that one should not deliberately harm another individual regardless of the benefits that might be made available to others as a result. However, as the Belmont Report reminds us, offering benefit to people and avoiding harm requires learning what is of benefit and what is harmful, even if in the process some people may be exposed to the risk of harm. To a great extent, clinical research is an activity designed to learn about the benefits and harms of unproven methods of diagnosing, preventing, treating, and caring for human beings. The challenge for clinical investigators and review/oversight groups is to decide in advance when it is justifiable to seek certain benefits despite the research risks, what level of risk is acceptable, whether risks have been minimized to the extent possible, and when it is better to forego the possible benefits because of the risks. This is called a risk/benefit assessment. The calculation and weighing of risks and benefits in research can be complicated. When designing a study, investigators consider whether the inherent risks are

25

justified by the expected value of the information and any possible benefit to the participants. Studies should be designed so that risks to participants are minimized and benefits are enhanced. When reviewing a study, an IRB identifies possible risks and benefits and determines whether the relationship of risks to benefits is favorable enough that the proposed study should go forward or instead be modified or rejected. When reviewing studies with little or no expected benefit for individual subjects, the IRB determines whether the anticipated risks or burdens to study subjects are justified only by the potential value of the knowledge to be gained, a particularly challenging risk/benefit assessment. Prospective subjects make their own risk/benefit assessment of whether the risks of participating in a given study are acceptable to them and are worth their participation. A risk/benefit assessment can include consideration of many types of risks and benefits, including physical, psychological, social, economic, and legal. For example, in a genetics study, physical risks may be limited to a blood draw or a buccal swab, so assessment of potential psychological and social risks is more important. Investigators, reviewers, and potential subjects may not only have dissimilar perspectives about research but also are likely to assign different weights to risks and benefits. For example, IRBs consider only health-related benefits of the research in justifying risks, whereas subjects are likely to consider access to care and financial compensation as important benefits that may tip the balance in favor of participation. Acknowledging that risk/benefit assessment is not a straightforward or easy process does not in any way diminish its importance. An important step in evaluating the ethics of clinical research involves not only careful attention to potential benefits to individuals or society of a particular study in relation to its risks, but also consideration of the risks of not conducting the research.

Independent Review Independent review is a process that allows evaluation of the research for adherence to established ethical guidelines by individuals with varied expertise and no personal or business interests in the research. For most clinical research, this independent review is carried out by an IRB or research ethics committee (REC). Using criteria detailed in US federal regulations,12,22 IRBs evaluate the value of doing the study, the risks involved, the fairness of subject selection, whether the risks have been sufficiently minimized and are justified, and the plans for obtaining informed consent; they then decide whether to approve a study, with or without modifications, to table a proposal for major revisions or more

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

26

2. ETHICAL PRINCIPLES

information, or to disapprove a study as unacceptable (See also Chapter 4). Independent review of the risks of proposed research by someone other than the investigator has been described as a “central protection for research participants.”37 Nonetheless, there is concern that the current IRB system in the United States is outdated given the current profile of clinical research, and also is bureaucratic, beset with conflicts, and in need of reform.38 Both the 2017 revisions to the Common Rule and recent NIH policy require single IRB review for domestic multisite studies.24,39

TABLE 2.3 The Process of Informed Consent Informed Consent Elements Description

Considerations and Challenges

Disclosure of information

Information about the study based on a “reasonable” person standard is disclosed to prospective participants. Disclosure takes into account subjects’ language, education, familiarity with research, and cultural values. Both written information and discussion are usually provided.

There is a need to balance the goal of being comprehensive with attention to the amount and complexity of information, to give participants the information they need and facilitate understanding.

Understanding

Knowledge of study purpose, risks, benefits, alternatives, and requirements.

Empirical data show that participants often do not have a good understanding of the details of the research.

Voluntary decision- Free from coercion making and undue influence. Subject is free to choose not to enroll.

Many possible influences affect participants’ decisions about enrolling in research. Avoid controlling influences.

Informed Consent Once a proposal is deemed valuable, valid, with acceptable risks in relation to benefits and fair subject selection, individuals are recruited and are asked to give their informed consent. The process of informed consent shows respect for persons and their autonomy, giving prospective subjects the opportunity to make autonomous decisions about participating and remaining in research, and respecting their choices about participation. We show lack of respect when we do not provide the necessary information to make a considered judgment, pressure an individual to make a particular judgment or deny him or her the freedom to act on judgments. The process of informed consent involves: disclosure of study information, comprehension of the information, voluntariness with respect to the decision, and authorization40 (Table 2.3). Information provided to subjects about a research study should be adequate, according to a “reasonable volunteer” standard, balanced, and presented in an understandable manner. Information should be provided in the language of the subject, at an appropriate level of complexity given the subject’s age, educational level, and culture. US federal regulations detail the types of information that should be included in informed consent12,22; this is essentially information that a reasonable person needs to know to make an informed decision about initial or ongoing research participation. Ideally, individuals receive the necessary information, understand it, process it in the context of their own situation and life experiences, and make a “voluntary” choice free from coercion or undue influence. The process of initial research informed consent usually culminates with the signing of a consent form. However, respect for persons requires that subjects continue to be informed throughout a study and are free to modify or withdraw their consent at any time. Although widely accepted as central to the ethical conduct of research, achieving informed consent is challenging. Determining the appropriate amount and complexity of information for disclosure is not straightforward. Written consent documents have become long

Authorization

Usually given by a For some individuals or signature on a written communities, requiring consent document. a signature reflects lack of appreciation for their culture or literacy level.

and complex, and large amounts of information may actually hinder subject understanding. Scientific information is often complex; research methods are unfamiliar to many people; and subjects have varying levels of education, understanding of science, and knowledge about their diseases and treatments, and are dissimilar in their willingness to enter into dialogue. Besides the amount and detail of information, understanding may be influenced by who presents the information and the setting. In some cases, information may be more accessible to potential subjects if presented in group sessions or through print, video, or other media presentations. Determining whether a subject has the capacity to consent and understands the particular study information is challenging. Capacity to provide consent is study specific. Individuals who are challenged in some areas of decision-making may still be capable of consenting to a particular research study. Similarly, individuals may not have the capacity to consent to a particular study, even if generally able to function in other areas of their lives. Assessing capacity might take into account an individual’s educational level and familiarity with science

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

ETHICAL CONSIDERATIONS IN RANDOMIZED CONTROLLED TRIALS

and research, as well as evidence of cognitive or decisional impairment. In some but not all cases, mental illness, depression, sickness, desperation, or pain may interfere with a person’s capacity to understand or process information. Empirical research on informed consent shows that participants do not always have a good understanding of the purpose or potential risks of the research studies for which they gave their consent.41 Informed consent to participation in research should be voluntary, and free of controlling influences, coercion and undue influence.40 Terminal or chronic illness, exhaustion of other treatment options, and lack of health insurance may limit a participant’s options but do not necessarily render decisions involuntary. Payment and other incentives, trust in health care providers, dependence on the care of clinicians, family pressures, and other factors commonly influence decisions about research participation. Most of the time, these are acceptable influences, but some worry that under certain circumstances, they can become controlling. Given these multiple factors, it is important to ensure that prospective subjects have and perceive that they have the option to say no to research participation and to do so with impunity. Research has demonstrated that active and ongoing dialogue and discussion between the research team and subjects, opportunities to have questions answered, waiting periods between the presentation of information and the actual decision to participate, the opportunity to consult with family members and trusted others, a clear understanding of alternatives, and other strategies can serve to enhance the process of informed consent.42,43

Respect for Enrolled Subjects Research participants deserve continued respect after enrollment, throughout the duration of the study, and when the study ends. Respect for subjects is demonstrated through appropriate clinical monitoring and attention to participants’ well-being throughout the study. Adverse effects of research interventions and any research-related injuries should be treated. Private information collected about subjects should be handled confidentially, and participants informed about the limits of confidentiality. Research subjects should be reminded of their right to withdraw from the research at any time without penalty. A change in clinical status or life circumstances, as well as new information from the study or other studies, may be relevant to a person’s willingness to continue participation. Investigators should make plans regarding the end of the trial, including participants’ continued access to successful interventions when indicated and to study results after the study is finished.

27

In summary, ethical clinical research is conducted according to the seven principles delineated in Table 2.2. Application of these principles to specific cases will always involve judgment and specification on the part of investigators, sponsors, review boards, and others involved in clinical research.

ETHICAL CONSIDERATIONS IN RANDOMIZED CONTROLLED TRIALS RCTs remain the principal method and “gold standard” for demonstrating safety and efficacy in the development of new drugs and biologics, and other interventions. An RCT has several characteristic features: RCTs are controlled, randomized, and usually blinded, and the significance of the results is determined statistically according to a predetermined algorithm. An RCT typically involves comparison of two or more interventions (e.g., Drug A vs. Drug B) to demonstrate that they are similar or that one is superior in the treatment, diagnosis, or prevention of a specific disorder. RCTs present a spectrum of unique ethical problems (Table 2.4). “In considering the RCT, the average IRB member must be baffled by its complexity and by the manifold problems it represents.”44 The ethical justification to begin an RCT is usually described as that of “an honest null hypothesis,” also often referred to as equipoise or clinical equipoise.45 In an RCT comparing interventions A and B, clinical equipoise is satisfied if there is no convincing evidence about the relative merits of A and B (e.g., evidence that A is more effective than or less toxic than B). The goal of an RCT is to provide credible evidence about the relative value of each intervention. Equipoise rests on a therapeutic commitment that patients should not receive a treatment known in advance to be inferior, nor should they be denied effective treatment that is otherwise available. Doubt based on lack of evidence about which intervention is superior justifies giving subjects an equal chance to get either one and makes it ethically acceptable to assign half or some portion of subjects to different treatments provided in an RCT. There remains some disagreement about the meaning, justification, and application of equipoise in clinical research. Some argue that equipoise is based on a mistaken confluence of research with therapy and therefore should be abandoned.46 Another controversy in RCTs involves what should count as “convincing” evidence. Some worry that the common acceptance of statistical significance at the P ¼ .05 level potentially discounts clinically significant observations. Statisticians recently criticized overreliance and misuse of the p-value, reiterating that it cannot

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

28 TABLE 2.4

2. ETHICAL PRINCIPLES

Selected Ethical Considerations in Randomized Controlled Trials (RCTs)

Features of RCTs

Description

Considerations

Equipoise

No convincing evidence that one intervention is better (i.e., more effective or less toxic) than another.

How to factor in early evidence? Does a requirement for equipoise conflate research and therapy?

Choice of control

Appropriate choice of control is necessary for scientific validity and generalizability.

Choice of control is not simply a scientific decision. Placebos as controls require ethical justification.

Randomization

Random assignment decreases bias and controls for many factors.

Random assignment does not allow for autonomous preferences.

Blinding

Single or double blinding is often used to decreases bias.

Research participants consent to temporarily suspend knowledge of which intervention they are receiving. In rare cases, a blind may need to be broken to manage certain clinical problems.

Sharing preliminary information

As evidence accumulates information about risks and benefits may change, and equipoise may be disturbed.

Study monitors, independent data and safety monitoring committees monitor data to help determine when the study should be stopped or altered, or information should be shared with participants.

tell you the probability that results are true or due to random chance, but only the probability of seeing results given a particular hypothetical explanation.47 People also disagree about the extent to which preliminary data, data from previous studies, data from uncontrolled studies and pilot studies, and historical data do or should influence the balance of evidence. In some cases, the existence of these other types of data may make equipoise impossible. However, data from small, uncontrolled, or observational studies can lead to false or inconclusive impressions about safety or efficacy. RCTs are usually monitored by data and safety monitoring committees who see data at specified time points during the trial and can recommend altering or stopping a trial based on prespecified boundaries for safety, efficacy, or futility.48

Another important scientific and ethical consideration in RCTs is the selection of outcome variables by which the relative merits of an intervention will be determined. Different conclusions may be reached depending on whether the efficacy of an intervention is a measure of survival or of tumor shrinkage, symptoms, surrogate end points, quality of life, or some composite measure. The choice of end points in a clinical trial is never simply a scientific decision. In an RCT, subjects are assigned to treatment through a process of randomization, rather than on the basis of individual needs and characteristics. The goal of random assignment is to control for confounding variables by keeping two or more treatment arms similar in relevant and otherwise uncontrollable aspects. Also, RCTs are often single blind (subject does not know which intervention he or she is receiving) or double blind (both subject and investigator are blinded to the intervention). Random assignment and blinding are methods used in clinical trials to reduce bias and enhance study validity. Although compatible with the goals of an RCT, random assignment to treatment and blinding to treatment assignment may seem incompatible with the best interests or autonomy interests of the patient-subject. In some placebo-controlled blinded studies, both subjects and investigators can guess (often because of side effects) whether they are receiving active drug or placebo, potentially thwarting the goal of reducing bias.49 The necessity and adequacy of blinding and randomization should be assessed in the design and review of each proposed research protocol. When randomization and blinding are deemed useful and appropriate for a particular protocol, two ethical concerns remain: (1) preferences for an intervention and information about which intervention a subject is receiving may be relevant to autonomous decisions and (2) information about which intervention the subject is receiving may be important in managing an adverse event or a medical emergency. With respect to the first concern, subjects should be informed about the purpose of the research and should be asked to consent to random assignment and a temporary suspension of knowledge about which intervention they are receiving. To balance the need for scientific objectivity with respect for a research subject’s need for information to make autonomous decisions, investigators should provide subjects with adequate information about the purpose and methods of randomization and blinding. Subjects are asked to consent to a suspension of knowledge about their treatment assignment until completion of the protocol or some other predetermined point, at which time they should be informed about which intervention they received in the clinical trial. In some cases, knowledge of which medications a subject is receiving may be important in the treatment

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

29

SUMMARY QUESTIONS

of adverse events or other medical emergencies. To balance the need for scientific objectivity with concern for subject safety, investigators should consider in advance the conditions under which a blind may be broken to treat an adverse event. Specifically, the protocol should specify where the code will be located, the circumstances (if any) under which the code will be broken, who will break it, how the information will be handled (i.e., will the investigator, the subject, the IRB, and the treating physician be informed?), and how breaking of a blind will influence the analysis of data. Research subjects should always have information about whom they should contact in the event of an emergency. The IRB should be satisfied that these plans provide adequate protection for patient safety. Plans also should be made for what will happen at the end of a trial. Some argue that those who volunteer for RCTs, especially in externally sponsored international research, deserve assurance in advance about access to interventions proven to be beneficial in the RCT. Investigators should plan for whether and how subjects randomized to an intervention that is benefiting them will continue to receive that intervention, and how those randomized to the inferior intervention might be given an opportunity to receive the better one. Considerable disagreement remains regarding the extent of the obligation of researchers or sponsors to ensure posttrial access. A participant may be concerned about participating in an RCT if one of the potential treatment assignments is placebo. Some people perceive randomization to placebo in clinical trials as problematic because it potentially deprives the individual of treatment that he or she may need. On the other hand, without proof of the safety and efficacy of an experimental treatment, it is possible that those randomized to placebo are simply deprived of potentially toxic side effects or of a useless substance.50 Scientifically, comparison of an experimental drug to placebo can allow efficient and rigorous establishment of efficacy. The alternative is an RCT that compares the investigational drug to an already established therapy, if one exists, which can be designed to test superiority or noninferiority of the two agents (i.e., the experimental drug is similar to the standard therapy control within a noninferiority margin). Some authors suggest that both scientific design and possible risk to subjects should be determinants of the acceptability of placebo.51 Most accept that the use of a placebo control in research is justified when (1) there is no proven effective treatment for the condition under study; (2) withholding treatment poses negligible risks to participants; (3) there are compelling methodological reasons for using placebo, and withholding treatment does not pose a risk of serious harm to participants; and, more controversially, (4) there are compelling methodological

reasons for using placebo, the research is intended to develop interventions that can be implemented in the population from which trial participants are drawn, and the trial does not require participants to forgo treatment they would otherwise receive.52 Most agree, however, that if the outcome for the patient of no treatment or placebo treatment is death, disability, or serious morbidity, a placebo control should not be used.53

CONCLUSION Ethical principles and guidance related to the conduct of clinical research with human participants help to minimize the possibility of exploitation and promote respect for and protection of the rights and welfare of individuals who serve as human subjects of research. This chapter has reviewed the historical evolution of research ethics, a systematic ethical framework for the conduct of clinical research, and ethical considerations of some of the unique features of RCTs. In addition to adherence to principles, codes of ethics, and regulations, the ethical conduct of human clinical research depends on the thoughtfulness, integrity, and sagacity of all involved.

SUMMARY QUESTIONS 1. Scientific validity is important to evaluating the ethics of clinical research. Without rigorous scientific validity, the research outcomes are not reliable so persons are unnecessarily asked to accept risk and burden. Assessing scientific validity includes consideration of: a. Sample size and study design b. Costs and budget c. Informed consent d. Amount of compensation to participants 2. Disclosure of which of the following items is necessary for an informed consent document? a. A statement that the study involves research and the study’s purpose b. An explanation of the proposed treatment or intervention and procedures c. The foreseeable risks and benefits of study participation d. All of the above 3. Although research participants are often exposed to some risk in clinical research, there are limits to the amount of acceptable risk. In evaluating risk in clinical research, it is commonly accepted that: a. Only known risks to participants are permitted b. Risks should be minimized and justified by the benefits or value of the study

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

30

2. ETHICAL PRINCIPLES

c. To compensate for possible risks, individuals must receive therapeutic benefit d. Risks are acceptable only when competent adults consent 4. In the proposed ethical framework of seven principles for evaluating clinical research, the final principle “respect for enrolled subjects” is understood to include at least: a. Establishing a contract between the subject and the researcher b. Monitoring the subject’s welfare and protecting confidentiality of information c. Keeping the financial costs of participation reasonable d. Informing the subject of new information only after the study is published 5. Most ethical research calls for the voluntary informed consent of the research participant. In informed consent, the research participant’s decision to participate in research is considered “voluntary” if it is free from: a. Any outside influences b. Payment or other inducements c. Coercion or undue influence d. Misunderstanding or cognitive impairment 6. Multiple codes and regulations provide guidance about the ethical conduct of research. One influential “living” document written by the World Medical Association has had considerable influence in developing national and local guidance. That document is: a. The Bill of Rights b. The Declaration of Helsinki c. The Nuremberg Code d. The Declaration of Lisbon

References 1. Brody H, Miller FG. The clinician-investigator: unavoidable but manageable tension. Kennedy Inst Ethics J 2003;13(4):329e45. 2. Eisenberg L. The social imperatives of medical research. Science 1977;198:1105e10. 3. Jonas H. Philosophical reflections on experimenting with human subjects. In: Freund P, editor. Experimentation with human subjects. New York: Braziller; 1970. 4. Kant as quoted in Beauchamp T. In: Childress J, editor. Principles of biomedical ethics. 4th ed. New York: Oxford University Press; 1994. p. 351. 5. Emanuel E, Grady C. Four paradigms of clinical research and research oversight. In: Emanuel G, Crouch L, Miller W, editors. Oxford textbook of clinical research ethics. NY: Oxford University Press; 2008. p. 222e30 [Chapter 22]. 6. US FDA. Significant dates in U.S. food and drug law history. Available at: http://www.fda.gov/AboutFDA/WhatWeDo/ History/Milestones/ucm128305.htm. 7. Rothman D. Ethics and human experimentationdHenry Beecher revisited. N Engl J Med 1987;317:1195e9.

8. Wiendling P, The Nazi medical experiments, Annas G, Grodin M. The Nuremberg code [Chapters 2 and 12 respectively in Emanuel]. In: Grady, Crouch, Lie, Miller, Wendler, editors. Oxford textbook of clinical research ethics. NY: Oxford University Press; 2008. p. 18e30. 136e140. 9. Beecher HK. Ethics and clinical research. N Engl J Med 1966;274: 1354e60. 10. Porter J, Koski G. Regulations for the protection of humans in research in the United States. In: Emanuel, Grady, Crouch, Lie, Miller, Wendler, editors. Oxford textbook of clinical research ethics. NY: Oxford University Press; 2008. p. 156e67 [Chapter 15]. 11. National Commission for the Protection of Human Subjects of Biomedical, Behavioral Research. The Belmont report: ethical principles and guidelines for the protection of human subjects of research. Washington, DC: U.S. Government Printing Office; 1979. 12. U.S. code of federal regulations title 45, part 46. Available at: www. hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm. 13. National Research Council. The social impact of AIDS in the United States. Washington, DC: National Academy Press; 1993. 14. Herbert-Croteau N, Brisson J, Lemaire J, Latreille J. The benefit of participating to clinical research. Breast Cancer Treat Res 2005; 91(3):279e81. 15. Bleyer A, Montello M, Budd T, Saxman S. National survival trends of young adults with sarcoma: lack of progress is associated with lack of clinical trial participation. Cancer 2005;103(9):1891e7. 16. Dresser R. Wanted: single, white male for medical research. Hastings Cent Rep 1992;22(1):21e9. 17. National Institutes of Health. Guidelines for the inclusion of women and minorities as subjects in clinical research. NIH guide for grants and contracts. Bethesda, MD: National Institutes of Health; March 18, 1994. 18. National Institutes of Health. NIH policy and guidelines on the inclusion of children as participants in research involving human subjects. NIH guide for grants and contracts. Bethesda, MD: National Institutes of Health; March 6, 1998. 19. The Nuremberg Code. Available at: www.hhs.gov/ohrp/references/ nurcode.htm. 20. World Medical Assembly. Declaration of Helsinki. Available at: www.wma.net/e/ethicsunit/helsinki.htm. 21. Council for International Organizations of Medical Sciences. International ethical guidelines for biomedical research involving human subjects. Geneva: CIOMS/WHO; 2002. Available at: www. cioms.ch. 22. U.S. code of federal regulations title 21, part 50 “protection of human subjects” and part 56. 23. US FDA. Comparison of FDA and DHHS humans subject protection regulations. Available at: http://www.fda.gov/ScienceResearch/ SpecialTopics/RunningClinicalTrials/EducationalMaterials/ ucm112910.htm. 24. Department of Homeland Security, et al. Final rule. Federal policy for the protection of human subjects Fed Regist 2017;82(12): 7149e274. Available at: https://www.gpo.gov/fdsys/pkg/FR2017-01-19/pdf/2017-01058.pdf. 25. International conference on harmonisation of technical requirements for registration of pharmaceuticals for human use: guidelines for good clinical practice 1996: E6(R1). Available at: http:// www.ich.org/products/guidelines/efficacy/article/efficacyguidelines.html. 26. World Health Organization (WHO) handbook for good clinical research practice (GCP): guidance for implementation. Available at: http://apps.who.int/medicinedocs/documents/s14084e/ s14084e.pdf. 27. Emanuel E. Researching a bioethical question. In: Gallin JI, Ognibene FP, editors. Principles and practice of clinical research. 3rd ed. London: Elsevier Inc.; 2012. p. 31e42 [Chapter 3].

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REFERENCES

28. Largent E, Grady C, Miller F, Wertheimer A. Misconceptions about coercion and undue influence: reflections on the views of IRB members. Bioethics 2013;27(9):500e7. 29. Emanuel E, Wendler D, Grady C. What makes clinical research ethical? J Am Med Assoc 2000;283(20):2701e11. 30. Emanuel E, Wendler D, Grady C. An ethical framework for biomedical research. In: Emanuel, Grady, Crouch, Lie, Miller, Wendler, editors. Oxford textbook of clinical research ethics. NY: Oxford University Press; 2008. p. 123e35 [Chapter 11]. 31. Freedman B. Scientific value and validity as ethical requirements for research: a proposed explanation. IRB Rev Hum Subjects Res 1987;9(5):7e10. 32. Levine C, Faden R, Grady C, Hammerschmidt D, Eckenwiler L, Sugarman J. Consortium to Examine Clinical Research Ethics. The limitations of “vulnerability” as a protection for human research participants. Am J Bioeth 2004;4(3):44e9. 33. U.S. code of federal regulations. Title 45, part 46. Subpart D. 34. Meltzer L, Childress J. Fair participant selection. In: Emanuel G, Crouch L, Miller W, editors. Oxford textbook of clinical research ethics. NY: Oxford University Press; 2008. p. 377e85 [Chapter 35]. 35. Emanuel E. Benefits to host countries. In: Emanuel, Grady, Crouch, Lie, Miller, Wendler, editors. Oxford textbook of clinical research ethics. NY: Oxford University Press; 2008. p. 719e28 [Chapter 65]. 36. Participants in the 2001 Conference on Ethical Aspects of Research in Developing Countries. Fair benefits for research in developing countries. Science 2002;298:2133e4. 37. National Bioethics Advisory Commission. Ethical and policy issues in research involving human participants: vol. 1. Report and recommendations. Available at: www.bioethics.gov/reports/past_ commissions/nbac_human_part.pdf. 38. Emanuel E, Wood A, Fleischman A, et al. Oversight of human participants research: identifying problems to evaluate reform proposals. Ann Intern Med 2004;141(4):282e91. 39. National Institutes of Health. Final NIH policy on the use of a single institutional review board for multi-site research; 2016. Available at: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-16094.html.

31

40. Faden R, Beauchamp T. A history and theory of informed consent. New York: Oxford University Press; 1986. 41. Mandava A, Pace C, Campbell B, Emanuel E, Grady C. The quality of informed consent: mapping the landscape. A review of empirical data from developing and developed countries. J Med Ethics 2012;38:356e65. 42. Flory J, Emanuel E. Interventions to improve research participants’ understanding in informed consent for research: a systematic review. J Am Med Assoc 2004;292:1593e601. 43. Nishimura A, Carey J, Erwin PJ, Tilburt JC, Murad MH, McCormick JB. Improving understanding in the research informed consent process: a systematic review of 54 interventions tested in randomized control trials. BMC Med Ethics July 23, 2013:14e28. 44. Levine R. Ethics and regulation of clinical research. 2nd ed. Baltimore: Urban & Schwarzenberg; 1986. 45. Freedman B. Equipoise and the ethics of clinical research. N Engl J Med 1987;317(3):141e5. 46. Miller F, Brody H. A critique of clinical equipoise: therapeutic misconception in the context of clinical trials. Hastings Cent Rep 2003;33(3):20e8. 47. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose, the American Statistician. To link to this article. 2016. http://dx.doi.org/10.1080/00031305.2016.1154108. 48. FDA. The Establishment and operation of clinical trial data monitoring committees for clinical trial sponsors. Available at: http:// www.fda.gov/regulatoryinformation/guidances/ucm127069.htm. 49. Fisher S, Greenberg R. How sound is the double-blind design for evaluating psychotropic drugs? Nerv Ment Dis 1993;181(6):345e50. 50. Levine R. The use of placebos in randomized clinical trials. IRB Rev Hum Subjects Res 1985;7(2):1e4. 51. Emanuel EJ, Miller FG. The ethics of placebo-controlled trialsda middle ground. N Engl J Med 2001;345(12):915e9. 52. Millum J, Grady C. The ethics of placebo-controlled trials: methodological justifications. Contemp Clin Trials September 12, 2013;36(2): 510e4. 53. Miller F, Brody H. What makes placebo-controlled trials unethical? Am J Bioeth 2002;2(2):3e9.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

3 Integrity in Research: Principles for the Conduct of Research Melissa C. Colbert, Robert B. Nussenblatty, Michael M. Gottesman National Institutes of Health, Bethesda, MD, United States

O U T L I N E Guidelines and Principles for the Conduct of Research33

Peer Review

41

Scientific Integrity and Research Misconduct

Publication Practices, Responsible Authorship, and Results Reproducibility Publication Practices Authorship Reproducibility

42 42 42 43

34

Responsibilities of Research Supervisors and Trainees 36 Data Management, Archiving, and Sharing Data Management Archiving Data Sharing

36 36 37 38

Study Questions

45

Research Involving Human and Animal Subjects

38

Acknowledgments

45

Collaborative and Team Science

39

References

45

Conflict of Interest and Commitment

40

Further Reading

46

GUIDELINES AND PRINCIPLES FOR THE CONDUCT OF RESEARCH

scientific freedom and creativity. The writers of the Guidelines tried to take into account the major differences in commonly accepted behaviors among different scientific disciplines. The initial version was issued in 1990, and it has subsequently been revised and reissued several times.1 In the latest version, important NIH policies have been added to what is now called the “Guidelines and Policies for the Conduct of Research in the Intramural Research Program at NIH” including how requirements pertaining to embryonic and fetal tissue research, creating a diverse and inclusive workforce and human subject research protections. The Guidelines serve as a framework for the education of NIH scientific staff in research conduct issues,

In the late 1980s, the leadership of the National Institutes of Health (NIH) Intramural Research Program (IRP) decided to develop a set of guidelines for the conduct of research at NIH that could be used as a basis of discussion, as well as education, of all scientific staff including those in training. The Guidelines for the Conduct of Research in the Intramural Research Program at NIH (referred to as the Guidelines) were “developed to promote the highest ethical standards in the conduct of research by intramural scientists at NIH.” The intent was to provide a framework for the ethical conduct of research without inhibiting

y

Deceased.

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00003-4

33

Copyright © 2018. Published by Elsevier Inc.

34

3. INTEGRITY IN RESEARCH: INDIVIDUAL AND INSTITUTIONAL RESPONSIBILITY

through discussion sessions and more formal courses, as well as a reference book. In 1995, the NIH Committee on Scientific Conduct and Ethics (CSCE) was established for the IRP to help set policies on these issues, as well as to set in place mechanisms for teaching the principles of scientific conduct and to establish mechanisms to resolve specific cases. This CSCE has been responsible for the last four versions of the Guidelines. They also created a computer-based Research Ethics course that all new scientific staff must complete, to ensure that everyone has the same basic understanding of the policies and regulations governing the responsible conduct of research, available to the public.2 Finally, the CSCE selects the topic, and interesting case studies, for yearly research ethics discussions in which all scientific staff participate.3 In addition to the Guidelines, NIH has other Guides, such as Sharing Research Resources, Standards for Clinical Research within the IRP, Human Biospecimen Storage and Tracking, Scientific Record Keeping, Training and Mentoring, Handling Research Misconduct Allegations and Food and Drug Administration (FDA) Amendments Act (FDAAA) Reporting of Research Results, which have been collected on a convenient public location: the NIH Sourcebook.4 Other institutions develop policies for the conduct of research for their investigators, which are specific for their needs. Books, textbooks, and symposia or colloquia proceedings5e7 that address scientific conduct and/or misconduct, as well as internet-based learning programs at many institutions,8e12 have increased. As a result of the mandate from the Office of Science and Technology Policy in the White House for the Office of Research Integrity (ORI), Department of Health and Human Services, to become primarily an educational office, ORI has been funding grants to support institutions in the development of research conduct materials and courses that can be made available widely to any institution interested in using them.13 The NIH Guidelines cover research integrity; mentore trainee relationships; data management, sharing and archiving; research involving human and animal subjects; collaborations and team science; conflict of interest and commitment; peer review and privileged information; publication practices; responsible authorship and results reproducibility; social responsibilities; and dual use research, among other issues and policies. These enumerated topics form the basis of the remainder of the chapter.

SCIENTIFIC INTEGRITY AND RESEARCH MISCONDUCT Scientists at the NIH, like scientists everywhere, should be committed to the responsible conduct of

research and the scientific method in seeking new knowledge. We expect that all research staff in the NIH IRP will maintain exemplary standards of intellectual honesty in designing, conducting, and presenting research as befits the leadership role of the NIH. The principles of the scientific method include formulation and testing of hypotheses, controlled observations or experiments, analysis and interpretation of data, and oral and written presentations of all of these components to scientific colleagues for discussion and further conclusions. The scientific community and the general public rightly expect adherence to exemplary standards of intellectual honesty in the formulation, conduct, and reporting of scientific research. Research integrity must form the foundation for the conduct of science, which underpins the reputation of the scientific community and supports confidence of the general public. Research misconduct undermines this basic foundation and erodes the public’s trust. The issue of research misconduct became one of interest to the public in the 1980s as a result of several cases involving high-profile scientists. In response to this, the Institute of Medicine (IOM) convened a committee, under the chairmanship of Dr. Arthur Rubenstein, to examine the issues. In 1989, this committee drafted and published “The Responsible Conduct of Research in the Health Sciences”.14 The IOM revisited the topic in 2001, again chaired by Dr. Rubenstein, resulting in a second report, “Integrity in Scientific Research, Creating an Environment that Promotes Responsible Conduct” the following year.15 These reports recognize the dangers of research misconduct and other egregious behaviors. They proposed that institutions as well as scientists develop standards for the ethical conduct of research to focus on promoting a research environment that values integrity and reproducibility in highquality research and diminish research misconduct. In 2015, the NIH made a marked change in its responsible conduct of research training requirements emphasizing that research ethics and integrity are the foundations for all good science and form a natural part of the culture and daily life at NIH. In addition to the required online training modules (available to the general public for many years), in-person training and an introduction to research ethics were extended to our junior scientists and students. Courses and interactive presentations for postdoctoral and visiting fellows now provide more options, and several different media modalities, including video vignettes and recorded workshops covering potentials and pitfalls in modern technologies for cell and structural biology as well as genome technology, emphasize reproducibility.16 In spite of genuine efforts and the good examples of investigators, cases of research misconduct persist.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

SCIENTIFIC INTEGRITY AND RESEARCH MISCONDUCT

Of concern, the trend has been increasing nationwide. The number of retraction of scientific papers, whether for findings of research misconduct or error, has risen sharply. Reports in 2012 suggest that research misconduct was often associated with greater than 60% of the retractions.17 This begs the question: has research misconduct actually increased or are we just more aware of the problem? In the past several years, the internet has become a platform for watchdogs, anonymous allegations, and widespread discussions of suspected research misconduct. Retraction Watch, a forum started in 2010 by Ivan Oransky and Adam Marcus18 makes an effort to widely publicize cases of research misconduct. Their work over the past 6 years has shown that many more retractions occur annually than was once thought. Although it identifies misconduct, the distinction between misconduct and research errors is not always made. A recent publication looking at retractions over several decades analyzed the effects on scholarly impact on authors and institutions from which the retractions came, and whether this translated to disciplines investigated. The results indicated negative outcome on author stature, particularly if the retraction was due to research misconduct, but only limited effect on the areas of scientific study.19 PubPeer, another web blog invites discussion and comment on recent publications.20 While a worthy undertaking, it appears to have devolved into what often seems an overzealous analysis of figures, particularly Western blots or other types of gel-based analyses, similar to the suspended website Science Fraud. Many of the postings are anonymous, highly critical and obliquely allege misconduct, as opposed to promoting thoughtful comment and critique. In the late 1990s and in early 2000, the federal policy defining research misconduct was released. This policy defined certain standards to be followed when reporting a finding of misconduct and describes a three-step process for assessing and then establishing whether research misconduct occurred (Table 3.1). The most recent NIH Intramural Policies and Procedures for Research Misconduct Proceeding is consistent with U.S. Public Health Service (PHS) regulations 42 CFR part 93. The PHS policy was the first universally applicable guidance for federally supported research. The IRP also provides an abridged Guide to the Handling of Research Misconduct Allegations as a pdf, available on the NIH Sourcebook.21 One of the most difficult and controversial aspects of finding research misconduct involves intent. Misconduct must be found to have been committed intentionally, knowingly, or recklessly. The ORI and other individuals and organizations are beginning to deal critically with how to define reckless behavior in

TABLE 3.1

35

Federal Definition of Scientific Misconduct, Standards, and Process by Which It Is Assessed

I. RESEARCH MISCONDUCT DEFINED Research misconduct is defined as fabrication, falsification, or plagiarism in proposing, performing, or reviewing research, or in reporting research results. Fabrication is making up data or results and recording or reporting them. Falsification is manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record. Plagiarism is the appropriation of another person’s ideas, processes, results, or words without giving appropriate credit. Research misconduct does not include honest error or differences of opinion. II. FINDINGS OF RESEARCH MISCONDUCT A finding of research misconduct requires that: There be a significant departure from accepted practices of the relevant research community; The misconduct be committed intentionally, or knowingly, or recklessly; and The allegation be proven by a preponderance of the evidence. III. PROCESS FOR ASSESSING OCCURRENCE OF RESEARCH MISCONDUCT Allegation Assessmentddetermination of whether allegations of misconduct, if true, would constitute misconduct and whether the information is sufficiently specific to warrant and enable an inquiry Inquirydthe process of gathering information and initial fact-finding to determine whether an allegation of misconduct warrants an investigation Investigationdthe formal examination and evaluation of all relevant facts to determine if scientific misconduct has occurred, and if so, to determine the person(s) who committed it and the seriousness of the misconduct.

scientific research and its role in research misconduct. “Recklessness” lies along a continuum of intent, moving from negligent to reckless to knowing and finally intentional. An important consideration is the standard of what a reasonable person would do under similar circumstances and what is acceptable to the relevant scientific community. Evaluation of context and of risk is at the heart of the process. ORI is planning discussions with legal experts and research integrity officers to establish guidance on interpretation of a standard for a decision of recklessness. Institutions may handle imposition of sanctions and appeal processes within certain guidelines. For the Department of Health and Human Services, the policy provides for ORI oversight of completed investigations.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

36

3. INTEGRITY IN RESEARCH: INDIVIDUAL AND INSTITUTIONAL RESPONSIBILITY

RESPONSIBILITIES OF RESEARCH SUPERVISORS AND TRAINEES NIH’s mission is to improve the health of the public through support of biomedical research as well as the training of biomedical scientists. The quality of research as well as the training provided students depends in large part on the relationship between the mentor and trainees in each NIH laboratory. The goals of a mentoretrainee relationship are to ensure that fellows receive the best possible training in how to conduct research and how to develop and achieve career goals. Mentoring and being mentored are essential life-long components of professional life. Research supervisors should always be mentors, but in addition trainees should be encouraged to seek out other mentors who may provide additional expertisedtogether they form the basis of a professional network. Characteristics of a good mentor include an interest in contributing to the career development of another scientist, research accomplishments, professional networking, accessibility, and past successes in cultivating the professional development of their fellows. The trainees themselves must be committed to the work of the laboratory and the institution, to the achievement of their research and career goals, and to actively participate in their training. It is the responsibility of the mentor to provide a rich research environment in which the trainee has the opportunity to acquire both conceptual and technical skills of the field. In this setting, the trainee should be provided with clear expectations and undertake a significant piece of research, usually chosen as the result of discussion between supervisor and trainee. Good communication is critical to success, including time spent by the mentor in reviewing primary data with the trainee. Mentors should consider the overall size of the research group and perhaps limit the number of trainees for whom they can provide an appropriate and productive training experience. The use of a “Welcome Letter” or Lab Compact is a useful tool to introduce trainees to the specific expectations and responsibilities of both trainees and mentors in the lab.22 Such a document is gaining acceptance in many institutions.23e25 Among the skills that trainees should acquire during their fellowship period are training in scientific investigationdhow to choose a first-rate research project, how to carry out the necessary experiments and analyses in an appropriate and rigorous way, how to incorporate knowledge of the research field and published literaturedwith the ultimate goal of developing increasing independence throughout the training period; training in communication skills, both written and oral; training in personal interactions, including negotiations,

persuasion and diplomatic skills, and in networking; and training in scientific responsibility, the legal and ethical aspects of carrying out research. In addition, fellows should be considering career pathways, in consultation with their mentors, being sure to survey the many options available to scientists these days. To ensure a rich and stimulating laboratory experience, mentors should strive to establish a diverse, talented research group. Attention should be given to assure that all trainees and employees (at NIH) are valued and included as respected members of the scientific community. While respecting cultural differences of the community, at the laboratory level this includes preparing records and conducting laboratory business in English, as a common language of science.

DATA MANAGEMENT, ARCHIVING, AND SHARING Data Management Scientific data may be divided into three categories: experimental protocols; primary data, which include instrument setup and output, raw and processed data, statistical calculations, photographic images, electronic files, and patient records; and procedures of reduction and analysis. Any individual involved in the design or execution of an experiment and subsequent data processing is responsible for the accuracy of the resultant scientific data and must be meticulous in the acquisition and maintenance of them. These individuals may include, in addition to the person responsible for actually carrying out the experiment, the principal investigator, postdoctoral fellows, students, research assistants, and other support staff such as research nurses. Research results should be recorded in a form that allows continuous access for analysis and review, whether via an annotated bound notebook or computerized records. All research data must be made available to the supervisor, as well as collaborators, for immediate review. Data management, including the decision to publish, is ultimately the responsibility of the principal investigator. Martinson et al.26 carried out a survey that asked respondents to report which, if any, questionable research practices they had engaged in over the previous 3 years. Among those who responded (46% of those surveyed), 27.5% reported that they had kept inadequate research records, suggesting that lack of appropriate record keeping is a serious problem. A follow-up study in 2012 confirmed the surprisingly high prevalence of questionable practices, such as poor record keeping and suggested, unfortunately, this may be becoming more the norm.27

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

DATA MANAGEMENT, ARCHIVING, AND SHARING

Good science requires keeping good records, both in the research laboratory and in clinical research. Good record keeping facilitates communication within the research team and preparation of data for publication or submission of intellectual property for patents. Properly maintained research records are required by many federal agencies, particularly those that regulate radioactivity, animal use, and FDA-regulated products. The importance of record keeping and data quality in clinical research, where protocols use drugs, devices, or biologics are frequently audited by federal agencies, cannot be overstated (see also Chapter 30 on Data Management in Clinical Trials). In any clinical setting, there are clear distinctions between patient care and clinical research. Patient care records are part of their medical record and usually accompany the patient wherever they go. In clinical research, the investigator must follow the clinical protocol and document the experience of the research participant. These records are kept at the research site, need to be well ordered, and are often organized in a research or regulatory binder. Documentation usually includes such things as records of institutional review board (IRB) actions, drug/device accountability as well as other materials and records, which demonstrate the site follows Good Clinical Practices. Clinical data should be retained as directed by federal regulations (Table 3.2).

TABLE 3.2 Scientific Record Keeping I. REASONS WHY GOOD RECORDS ARE IMPORTANT IN SCIENTIFIC RESEARCH 1. Good record keeping is necessary for data analysis, publication, collaboration, peer review, and other research activities. 2. Good record keeping is required by the NIH to meet the accepted policies and standards for the conduct of good science. 3. Good record keeping is necessary to support intellectual property claims. 4. Good record keeping can help defend you against false allegations of research misconduct. 5. Good record keeping is important in the care of human subjects. II. RESEARCH RECORDS SHOULD DESCRIBE OR EXPLAIN THE FOLLOWING 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Name of the person making the Record What was done When the record was madedmonth, day, year The purpose of the research The project associated with the research The methodology involved The materials used The findings Interpretation of the findings Future plans Continued

TABLE 3.2

37

Scientific Record Keepingdcont’d

III. CLINICAL RECORDS 1. Medical record of patient care documents a. Why the patient is heredhistory/diagnosis? b. What was donedtreatments, procedures, tests? c. When was the action performed? d. Who performed the action or activity? e. What was the outcome of caredresponse, prognosis? 2. Clinical Research Records for a Regulatory Binder include a. IRB approved documentation b. Participant information c. Technology agreements d. Personnel documents (FDA study) e. Site monitoring (FDA study) f. Laboratory information (FDA study) g. Pharmacy documentation (FDA study) h. Study documentation (FDA study) i. FDA regulatory Documentation IV. RECORDS RETENTION 1. Research Data: a. Records of basic researchd7 years b. Records that support patent or invention rightsd30 years after patent filed c. Records of historical significancedtransferred to the National Archives maintained permanently 2. Clinical Research Data: a. Data: Subject to FDAAA i. Results deposited in ClinicalTrials.gov within 12 months after primary completion date ii. 2 years following the date a marketing application is approved iii. 2 years after the investigation is discontinued and the FDA is notified that no application is to be filed b. Data subject to NIH regulations i. Results deposited in ClinicalTrials.gov within 12 months after primary completion date IRB, institutional review board; FDA, Food and Drug Administration; FDAAA, FDA Amendments Act; NIH, National Institutes of Health.

Archiving At the NIH, all data collected, as well as laboratory notebooks, research records, and other supporting materials such as unique reagents, belong to the government and must be retained for a period of time sufficient to allow for further analysis of the results as well as repetition by others of published material. Intramural research records are the property of the NIH and must be maintained for a period of time as dictated. All records must be maintained at least 7 years after completion of the project. Records supporting intellectual property rights (patents or inventions) must be maintained for 30 years after the patent is filed. Records of historical significance are transferred to the National Archives and are maintained permanently.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

38

3. INTEGRITY IN RESEARCH: INDIVIDUAL AND INSTITUTIONAL RESPONSIBILITY

Data Sharing Once publications have appeared, supporting materials must be made available to all responsible scientists seeking further information or planning additional experiments when possible; for example, aliquots of any monoclonal antibody that derives from a continuously available cell line must be provided, whereas the final aliquots of a polyclonal antibody, needed by the original lab to finish additional experiments, do not. The NIH IRP, in line with other research institutions, has required that transgenic or knockout mouse lines be made available, preferably through deposition in a commercial mouse facility. Requests for human samples require IRB review and approval prior to sharing to ensure that confidentiality issues are covered. All primary research data in the IRP are subject to the Freedom of Information Act (FOIA). In 2014, the NIH instituted a policy that requires all grants and contracts to include a plan to share genomic data.28 This policy covers large-scale human and nonhuman genomic data, including genome-wide association studies, single-nucleotide polymorphisms arrays, genome sequence, transcriptomic, metagenomic, epigenomic, and gene expression data. This facilitates the opportunity for further understanding of factors that influence health and disease. NIH established a dedicated website where information is stored on data repositories, NIH-funded databases and NIH database collaborators.29 The FDAAA of 2007 was designed to improve public access to information about clinical trials of FDAregulated products and devices. In 2015, a Notice of Proposed Rule Making expanded FDAAA rules regarding registration of trials and results reporting. Concurrently, NIH issued a draft policy to promote broad and responsible dissemination of information on clinical trials funded by the NIH through registration and submission of summary results information to ClinicalTrials.gov. This policy expands the scope of studies covered, identifies which studies require results reporting, but also included a more cogent definition of what constitutes a clinical trial covered by the policy. As of this writing, these policy changes are still under review (see also Chapter 9). Information is available at a number of websites.30,31

RESEARCH INVOLVING HUMAN AND ANIMAL SUBJECTS The use of humans and animals in research is essential to the NIH mission for improving human health but such research entails special ethical and legal considerations. Many chapters in this textbook address the

issues related to carrying out human subject research and readers may wish to consult the Office of Human Subjects Research Protections, Policies and Procedures.32 While research ethics and training have traditionally focused on bench scientists, increasing educational efforts have been targeted to clinical and translational scientists. Translational science involving human subjects is subject to unique ethical issues that differ substantially from basic research. Given that clinical research is highly regulated by both the Office of Human Research Protection and the FDA, it is of concern that the survey by Martinson et al.26 reported that 0.3% of those responding said they had ignored major aspects of human subject requirements while 7.6% circumvented certain minor aspects. Organizations such as Public Responsibility in Medicine and Research (PRIM&R) and Collaborative Institutional Training Initiatives (CITI) once primarily concerned with education in protecting human subjects have increasingly sponsored more workshops, webinars, and online courses in the Responsible Conduct of Research for clinical scientists. CITI created a discipline-specific public access course with a module for biomedical research.33 Other online resources available include a Research Networking for clinical researchers supported through the Clinical and Translational Research Awards program.34 Several institutions also maintain online case libraries of issues related to ethical research with examples taken from clinical scenarios.34 The collection of human biospecimens (Table 3.3), a valuable and unique resource, must be handled under the highest ethical and scientific standards. Under the Notice of Published Rule Making, The Common Rule (45 CFR 46) may soon require a broad consent for secondary use of collected biospecimens, in which persons give consent to future unspecified research uses. NIH developed specific Guidelines for Human Biospecimen Storage and Tracking within the NIH IRP, updated in 2013.35 All samples, irrespective of whether they were obtained during standard of care or for research and regardless of whether individuals are still TABLE 3.3 Human Biospecimens Biological Materials or Derivatives Thereof Include 1. 2. 3. 4. 5. 6. 7. 8.

DNA Cells or cell lines Tissue (bone, muscle, connective tissue, skin) Organs, (heart, liver, bladder, kidney, etc.) Blood Gametes (sperm, ova) Embryos and fetal tissue Waste (urine, feces, sweat, hair and nail clippings, shed epithelial cells, placenta)

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

COLLABORATIVE AND TEAM SCIENCE

living, are covered by a new tracking system. The guidelines cover legal and ethical considerations for the collection, storage, use, sharing, and disposal of all human materials. A major ethical and legal issue related to biospecimen banking and custodianship involves ownership and disposition of these materials. Several prominent lawsuits have been reviewed and adjudicated recently, which deal with privacy issues, tissues and DNA as property and as potential sources of income.36,37 Although often contradictory, Federal rulings generally have stated that once tissues are removed, they are no longer the property of the donor but belong to the research institute where the study originated.38 Human biospecimens obtained by NIH researchers are considered federal property and must remain in the custody of NIH, although materials are made available for use by specific written agreement. The use of laboratory animals often is essential in biomedical research, but in using animals, a number of important points must be kept in mind. Animals must always be cared for and used in a humane and effective way, with procedures conducted as specified in an approved protocol. The use of animals in research must be reviewed by an Animal Care and Use Committee (ACUC), in accordance with the Association for Assessment and Accreditation of Laboratory Animal Care International (AAALAC) guidelines. ACUC committees perform the following functions: review and approve protocols for animal research; review the institute’s program for humane care and use of animals; inspect all of the institution’s animal facilities every 6 months; and review any concerns raised by individuals regarding the care and use of animals in the institute. The NIH phased out all laboratory experiments with chimpanzees in 2015.39 In the 2016 spending bill, Congress strongly urged specific review of nonhuman primates in NIH-funded biomedical research. A workshop is planned for the summer of 2016 to develop policies and procedures for research with nonhuman primates. An investigator’s responsibilities in using animals for research include humane treatment of animals; following all procedures that were specified in the approved protocol; following the general requirements for animal care and use at the institution; and reporting concerns related to the care and use of laboratory animals. The policies and regulations for the utilization and care of laboratory animals are primarily concerned with minimizing or alleviating the animal’s pain and utilizing appropriate alternatives to animal testing when possible. In recent years great emphasis has been placed on the three R’sdreduction, refinement, and replacement (Table 3.4). However, experiments with

TABLE 3.4

39

The Three R’s in Animal Research

Reduction: Reduction in the numbers of animals used to obtain information of a certain amount and precision. Refinement: Decrease in the incidence or severity of pain and distress in those animals that are used. Replacement: Use of other materials, such as cell lines or eggs, or substitution of a lower species, which might be less sensitive to pain and distress, for a higher species.

animals should always consider sex as a biological variable, with appropriate experimental design to evaluate this important variable.40

COLLABORATIVE AND TEAM SCIENCE Research collaborations facilitate progress and should be encouraged. As research methods become more specialized and resources diminish, team science is not only attractive but in some cases necessary. The ground rules for collaborations, including authorship issues, should be discussed openly among all participants from the beginning. Research data should be made available to all scientific collaborators on a project upon request. Although each research project has unique features, certain core issues are common to most of them. Successful collaborations are characterized by a strong sense of direction, a willingness to commit time and effort, an efficient communication strategy for discussion among the group members, a system for reevaluation as the project progresses, and a clear definition of roles and responsibilities. It is advisable that the ground rules for collaborations, including eventual authorship issues, be discussed openly among all participants from the beginning. The NIH Ombudsman Office has developed a useful set of criteria to consider establishing collaborations and a Field Guide for Team Science shown in Table 3.5.41 Whenever collaborations involve the exchange of biological materials, they are routinely formalized by written agreements. Material Transfer Agreements used for simple transfer of proprietary research material without collaboration, for example, if you request a reagent from, or give one to, a colleague outside of NIH. Collaborative Research and Development Agreement (CRADA): Agreements between one or more NIH laboratories and at least one nonfederal group (private sector, university, not-for-profit, nonfederal government). CRADAs provide a protected environment for long-term collaboration; they confer intellectual

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

40

3. INTEGRITY IN RESEARCH: INDIVIDUAL AND INSTITUTIONAL RESPONSIBILITY

TABLE 3.5

Questions for Scientific Collaborators

Although each research project has unique features, certain core issues are common to most of them and can be addressed by collaborators posing the following questions: OVERALL GOALS 1. What are the scientific issues, goals, and anticipated outcomes or products of the collaboration? 2. When is the project over? WHO WILL DO WHAT? 1. What are the expected contributions of each participant? 2. Who will write any progress reports and final reports? 3. How, and by whom, will personnel decisions be made? How, and by whom, will personnel be supervised? 4. How, and by whom, will data be managed? How will access to data be managed? How will long-term storage and access to data be handled after the project is complete? AUTHORSHIP AND CREDIT 1. What will be the criteria and the process for assigning authorship and credit? 2. How will credit be attributed to each collaborator’s institution for public presentations, abstracts, and written articles? 3. How, and by whom, will public presentations be made? 4. How, and by whom, will media inquiries be handled? 5. When and how will intellectual property and patent applications be handled? CONTINGENCIES AND COMMUNICATING 1. What will be the mechanism for routine communications among members of the research team (to ensure that all appropriate members of the team are kept fully informed of relevant issues)? 2. How will decisions about redirecting the research agenda as discoveries are made be reached? 3. How will the development of new collaborations and spin-off projects, if any, be negotiated? 4. Should one of the principals of the research team move to another institution or leave the project, how will data, specimens, lab books, and authorship and credit be handled?

property rights to NIH inventions and are handled by the technology Transfer Office in each Institute. The NIH Office of Technology Transfer developed a set of FAQs to help investigators determine which instrument is most appropriate.42

CONFLICT OF INTEREST AND COMMITMENT Conflict of interest is a legal term that encompasses a wide spectrum of behaviors or actions involving personal gain or financial interest. According to Frank Macrina, “a conflict of interest arises when a person exploits, or appears to exploit, his or her position for personal gain or for the profit of a member of his or her immediate family or household.”43 The existence

of a conflict of interest may adversely affect the ability to objectively carry out scientific studies and report their results. Potential conflicts of interest may not be recognized by others unless disclosed; disclosure should include all relevant financial relationships. Disclosure is made to the appropriate organization depending on the activity: to one’s research institution while carrying out the research; to the funding agency when involved in peer review of grants; to meeting organizers when giving an invited presentation; and to journal editors when asked to referee articles, or when submitting one’s own manuscripts for consideration. Three-tenths percent of respondents to the survey26 on inappropriate research behaviors reported “not properly disclosing involvement in firms whose products were based on their own research,” suggesting that this is an issue that needs to be further addressed. The personal integrity of the physician is a paramount concern of society that dates back to the beginning of written history. Because clinical research involves a somewhat different relationship between investigators (many of whom are not physicians) and patients, it has been necessary to develop a new set of community standards to assure the integrity of the clinical research process. One set of ethical standards relates to the need to protect human subjects involved in clinical research. Another concern relates to the way in which real or perceived conflicts of interest may affect the integrity of clinical research. A conflict of interest occurs when other interests that the physician may have undermined, or appears to undermine, his or her objectivity and conduct in meeting those goals. A chronic concern is the interaction between industry and clinical researchers in the handling of clinical trials, and the potential for conflict of interest. Given the enormous costs of clinical trials, combined with the desire of clinical investigators to try the latest drugs, which are often only available from drug companies, increasingly companies serve as the sponsors of clinical trials (70% of the funding for such trials) and as such, may seek control over the research protocol and publication of the results. Even the appearance of such a conflict, without intent on the part of the investigator, is corrosive to the integrity of clinical investigation. Although most clinical investigators will deny vehemently that their financial interests would affect their research and clinical activities, studies have shown that interactions with pharmaceutical firms can have an effect on decision-making by physicians.44,45 Those who were receiving remuneration of some kind from pharmaceutical firms were more likely to support the safety of the drugs of those companies, and we can presume that research activities would be similarly affected. In 2005 federal restrictions on relations with pharmaceutical companies and the

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

41

PEER REVIEW

biotechnology industry were enacted. Although not a total ban, collaborations and interactions remain strong. Agreements with industry, including CRADAs and Clinical Trials Agreements are reviewed thoroughly by the NIH Technology Transfer and Ethics offices, demonstrating that industry interactions can still flourish and conflicts avoided within ethical guidelines that strengthen public trust. Preventing conflicts of interest (COI) in clinical studies is of particular concern to NIH, where conflicts have the potential to skew enrollment, data acquisition and analysis, and outcomes. NIH historically has provided advice and guidance on COI for its employees for many years and updated its “Guide to Avoiding Financial and Non-Financial Conflicts or Perceived Conflicts of Interest in Clinical Research at NIH” (Guide) in 2012.46 The guide is intended to provide assistance to those engaged in clinical research, IRB, and Data and Safety Monitoring Board (DSMB) members in avoiding real or perceived financial and nonfinancial conflicts of interest. Another concerning conflict involves the propensity of academic scientists to establish start-up companies sponsored or supported by their institutions as a result of the Bayh-Dole Act (1980), which allows investigators to profit from their intellectual property. A classic example of such a conflict involved the patenting of the BRCA1 gene associated with hereditary susceptibility to breast cancer by Myriad Inc, in 1991. Myriad Inc. held seven patents that prohibited the use of the genetic sequence for testing of patients for the likely presence of this high-risk gene mutation. In a class action lawsuit filed by the American Civil Liberties Union on behalf of patients, the courts in 2010 overruled the ability of Myriad to commercialize genetic testing, by demonstrating that profit motivated the patent without concern for medical advancement in the detection and treatment of breast and ovarian cancer.47 While conflict of interest issues has been a main concern in public policy, conflict of commitment can be equally important. This refers to the idea that someone has agreed to do more things than possible, especially activities that have no direct bearing on their employment responsibilities. These could be compensated or uncompensated activities, such as work with professional or nonprofit organizations in off-duty hours, which take away from primary responsibilities. Examples could include excessive commitments of time for work on behalf of scientific societies, nongovernmental organizations, or participation in outside private clinical practice. Similarly, overcommitment can lead to ethical problems. For example, when someone takes on too many trainees, or oversees too many clinical trials as Principal investigator, they become incapable of giving their best effort to all of them. If an investigator cannot find the

time to meet with a fellow, to review data and results, critique the first draft of a manuscript within a few days or a week, or to personally supervise the running of a clinical trial, that is a strong sign of overcommitment. Failure to personally oversee (1) basic research projects, a common occurrence in many research misconduct cases, or (2) clinical research requirements such as adequate monitoring of FDA-regulated products is one of the most common findings cited in audit reports and FDA warning letters.

PEER REVIEW Peer review is defined as a critical evaluation, conducted by one or more experts in the relevant field, of either a scientific documentdsuch as a research article submitted for publication, a grant proposal, or a study protocoldor a research program. One requisite element for peer review is the need for reviewers to be experts in the relevant subject areas. At the same time, real or perceived conflict of interest arising as a result of a direct competitive, collaborative, or other close relationship with one of the authors of the material under review should be avoided. All evaluations should be thorough and objective, fair and timely, and based solely on the material under review: information not yet publicly available cannot be taken into consideration. The use of multiple reviewers mitigates to some extent one inappropriate review, but nevertheless reviewers should strive to provide constructive advice and avoid pejorative comments. Since reviews are usually conducted anonymously, it is incumbent on the reviewer to protect the privileged information to which he or she becomes privy. No reviewer should share any material with others unless permission has been requested and obtained from those managing the review process. One of the marks of a good mentor is someone who teaches trainees how to handle peer review by asking them to review a submitted manuscript, but it is incumbent on the mentor to notify the journal that he/she plans to do so and get explicit permission before doing so. Sadly, peer review fraud has begun to appear with alarming regularity. Between 2012 and 2015, several prominent publishers have reported being scammed by fraudulent reviewers, many times by the authors of the papers themselves. In 2014, a Nature article reported that over 110 papers were retracted owing to being implicated in peer review fraud.45 In 2015 Hindawi, which publishes 437 academic journals, retracted 32 articles; BioMedCentral, publisher of 277 journals, and Springer, which owns BioMedCentral, together retracted 74 of their articles when confronted with evidence of reviewer fraud by both authors and editors.48

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

42

3. INTEGRITY IN RESEARCH: INDIVIDUAL AND INSTITUTIONAL RESPONSIBILITY

These retractions cover a variety of disciplines and authors from several countries. The ease with which this took place was astounding and relates mostly to the common practice that permits authors to recommend reviewers. Often the author established a separate email account under an assumed name and put this individual forward as a reviewer. Journals seldom checked the legitimacy of the proposed reviewers, who were unknown and unaffiliated with an academic institution. Because of the avalanche of papers that inundate publishers and editors, more and more reliance is placed on the use of publishing software where reviews are distributed and processed. In the cases cited, evidence implicates the ease with which vulnerabilities such as passwords in publishing software, such as ScholarOne or Editorial Manager were hacked. A computer security expert from Harvard commented “As you make the system more technical and more automated, there are more ways to game it.”49 And as long as the currency of scientific achievement remains the number of papers published, new and more inventive ways to game the system seem to reappear with regularity.

PUBLICATION PRACTICES, RESPONSIBLE AUTHORSHIP, AND RESULTS REPRODUCIBILITY Publication Practices Publication of results fulfills a scientist’s responsibility to communicate research findings to the scientific community, a responsibility that derives from the fact that much research is funded by the federal government using taxpayers’ money. Publication of clinical studies also fulfills the responsibility to provide scientific benefit in return for putting human subjects at risk. Other than presentations at scientific meetings, publication in a scientific journal should normally be the mechanism for the first public disclosure of new findings. An exception may be appropriate when serious public health or safety issues are involved. Timely publication of new and significant results is important for the progress of science but each publication should make a substantial contribution to its field. Fragmentary publication of the results of a scientific investigation or multiple publications of the same or similar data are not appropriate. Publications share findings that benefit society and promote human health, but publications also establish scientific principles and precedence. Credit for a discovery belongs to the first to publish, and reputations and research funding are based on the number and impact of publications, as are improved opportunities for prestigious positions.

The recent proliferation of so-called Predatory journals should be a cautionary note when considering a journal for submission of results.50 Starting around 2008, concerns about predatory publishing has been growing, and the number of dubious article published in these journals in 2014 reached 420,000. These journals attract authors with false or misleading information and solicit articles from well-known authors to add prestige to their journals. Characteristics of a predatory journal may include publishing with little or no peer review (rapid turn-around time), aggressive solicitation of submissions, splashy websites, imitating journal names with subtle changes or miss-represented titles, misleading claims, or fake impact factors. If in doubt, best refer to an indexed list of journals, such as found in PubMed.

Authorship Authorship is the primary mechanism for determining the allocation of credit for scientific advances and is thus the primary basis for assessing a scientist’s contributions to developing new knowledge. As such, it not only conveys great benefit but also significant responsibility. Authorship involves the listing of names of participants in all communications (oral or written) concerning experimental results and their interpretation, as well as making decisions about who will be the first author, the senior author, and the corresponding author. Yet authorship on publications often generates some of the most difficult disputes among scientists, because of its importance for careers. The NIH established benchmarks for authorship credits, developed by the CSCE. Furthermore, they determined a procedure to adjudicate authorship disputes. Recommendations range from mediation through the NIH Ombudsman (note: authorship disputes constitute the single largest group of scientific complaints with which the NIH Ombuds office deals), to establishing a peer review panel empowered to make binding recommendations, to a final decision by the Scientific Director of the institute or the Deputy Director for Intramural Research.51 Authorship is justified by a significant contribution to the conceptualization, design, execution, and/or interpretation of the research study and a willingness to assume responsibility for the study. Other ways to establish credit for contributions besides authorship include acknowledgments and references. Acknowledgments provide recognition of individuals who have assisted the research by their encouragement and advice about the study, editorial assistance, technical support, or provision of space, financial support, reagents, or specimens. References acknowledge others’ discoveries, words, ideas, data, or analyses and must be cited in

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

PUBLICATION PRACTICES, RESPONSIBLE AUTHORSHIP, AND RESULTS REPRODUCIBILITY

a way that others can find the reference and see the contribution. According to results from the 2005 survey on questionable practices, 1.4% reported that they had used others’ ideas without obtaining permission or giving credit.26 When should authorship issues be discussed? Although there is no universal set of standards for authorship, each research group should freely discuss and resolve questions of authorship before and during the course of a study. Each author should fully review material that is to be presented in a public forum or submitted (originally or in revision) for publication. Each author should indicate a willingness to support the general conclusions of the study before its presentation or submission. Settling authorship issues should be finalized as early as possible to avoid any conflicts over credit for scientific work. With the recent increase in numbers of authors on publications, the problem has increased in magnitude. A number of studies in the late 1990s and early in 2000s, defined several categories of irresponsible authorship.52,53 These include honorary authorshipdan author who does not meet the criteria; ghost authorshipd failure to include as an author, someone who made substantial contributions to the article; refusal to accept responsibility for an article despite ready acceptance of credit; and duplicate and redundant publications. Rennie and colleagues52 carried out a study based on the following hypotheses: research articles in largecirculation prestigious medical journals would be more likely to have honorary authors while review articles in smaller-circulation journals that publish symposia proceedings would be more likely to have ghost authors. Despite disproving the hypotheses, the study showed a significant misuse of authorship in biomedical journals that ultimately led to a number of changes regarding authorship criteria. The International Committee of Medical Journal Editors (ICMJE) issued a set of Uniform Requirements for Manuscripts Submitted to Biomedical Journals54 to address standards for authorship and more recently established “ethical principles related to publication in biomedical journals.” They defined an author as someone “who has made substantive intellectual contributions to a published study” and provide a set of criteria for authorship as shown in Table 3.6. In a follow-up survey, Wisler et al. noted that although allover inappropriate authorship (e.g., ghost authorship) declined, there was little to no change in levels of honorary authorship.53,55 The Journal of the American Medical Association authorship policy more specifically states that all authors must describe their specific contributions as well as the contributions of those included in the acknowledgments.56 Authors must determine the distribution of authorship between who authors and who should simply be acknowledged. Authors should

TABLE 3.6

43

International Committee of Medical Journal EditorsdCriteria for Authorship

Authorship should be based on: • Substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data • Drafting the article or revising it critically for important intellectual content • Final approval of the version to be published Authors should meet all three conditions. Furthermore, all persons designated as authors should qualify for authorship, and all those who qualify should be listed.

be listed in order of actual degree of contribution, based on discussions among the authors. The NIH CSCE developed a visual graphic of general guidelines, based on a sliding scale developed by Evelyn Ralston at the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), to help authors work through these issues Table 3.7. The Annals of Internal Medicine, in concordance with the general principles illustrated in Tables 3.6 and 3.7, further notes that the following, by themselves, are not criteria for authorship: holding a position of administrative leadership; contributing patients or reagents; or collecting and assembling data. Adhering to these criteria will result in a significant change to the way authorship is determined for clinical studies and may require a culture change.57

Reproducibility Although each paper should contain sufficient information for the informed reader to assess its validity, the principal method of scientific verification is not simply a review of submitted or published papers, but the ability of others to replicate the results. Concerns are mounting about the current system for ensuring reproducibility in biomedical research.58 Poor training in experimental design and publications missing basic elements of design and increased pressure to publish in high-impact journals with space limitations all affect methodological completeness. A rush to publish for promotion and tenure decisions also contribute to lack of reproducibility. Likewise the dearth of publications on negative data or failure to correct flawed methodology in previously published work is at fault. NIH proposed several recommendations for enhancing reproducibility.58 Each paper should contain all the information necessary for other scientists to repeat the work. When considering reproducibility in research, two specific issues should be addressed: potential biases and rigorous experimental design (Table 3.8). Experimental bias in reporting results can be unconscious

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

44

3. INTEGRITY IN RESEARCH: INDIVIDUAL AND INSTITUTIONAL RESPONSIBILITY

TABLE 3.7

General Guidelines for Authorship Contributions

TABLE 3.8

Experimental Design and Reproducibility

Issues that affect experimental design and reproducibility of research results Bias: Prejudice in favor or against one idea, thing, person, or group compared with another usually in a way considered to be unfair. Bias in research may be unconscious or unintentional. Scientific Rigor: The strict application of the scientific method to ensure robust and unbiased experimental design, methodology, analysis, interpretation, and reporting of results. Links to NIH Training Videos focusing on issues critical for reproducibility https://oir.nih.gov/sourcebook/ethical-conduct/responsibleconduct-research-training/instruction-responsible-conduct-researchpostdoc-irta-crta-vf-research-0 Discussion documents are provided for each vignette. 1. 2. 3. 4.

Lack of transparency Blinding and randomization Biologic and technical replicates Sample size and exclusion criteria

and unintentional. Causes may include unknown or unavoidable differences between comparison groups, less than ideal experimental designs, systematic errors introduced between test groups, and poor methodology, analysis, interpretation, or reporting of results. Blinding can minimize bias through randomization or stratification, reporting all data (both positive and negative), and establishing criteria to identifying outliers. Rigor in scientific research includes consideration of experimental design, rationale for selecting endpoints or model systems, use of both positive and negative controls, consistent experimental conditions, sample size, power calculation, statistical methods used for analysis and interpretation of results, and maintaining rigorous, detailed laboratory records.59 Good experimental design is a critical factor for reliable reproducibility of research results. A clearly stated question is important, as are the choice of the appropriate tools that allow clear answers to the question.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REFERENCES

Working with a statistician in advance of starting experiments will help to determine the model of analysis and the number of data points needed to make a valid conclusion.60 Judicious publication of new and significant results is important for the progress of science, but each publication should make substantial contribution to its fields. Fragmentary publications of results of a scientific investigation or multiple publications of the same or similar data are not appropriate.

STUDY QUESTIONS 1. Which of the following is not a criterion for research misconduct? a. Making up data b. Appropriating ideas from someone else’s research application c. Failure to retain research records d. Changing results to match the hypothesis 2. A finding of research misconduct requires that a. The allegations must be proven by a preponderance of the evidence b. The misconduct is committed knowingly, recklessly, or intentionally c. There has been a significant departure from accepted practices of the relevant research community d. All of the above 3. Authorship for a publication should be based upon a. Substantial contributions to conception and design b. Financial interest of the sponsor of the study c. Personal relationship with the PI of the study d. Reputation of a colleague 4. Which factor(s) do/does not contribute(s) to lack of reproducibility? a. Increased emphasis on provocative statements b. Publication of negative data c. Pressure to publish in high-impact journals d. Promotion and tenure incentives for publications

Acknowledgments We are grateful to the many colleagues who have contributed to the ideas presented in this chapter. In particular, we thank the contributions of the members of the NIH Committee on Scientific Conduct and Ethics (CSCE) to the latest revision of the Guidelines and Policies for the Conduct of Research in the Intramural Research Program at NIH. Since the inception of the CSCE, its many members have contributed to the development and continued refinement of the NIH Responsible Conduct of Research Education program. And finally to Robert Nussenblatt, our esteemed and respected collaborator on issues related to research integrity. Bob passed away during the preparation of the revision of this chapter and we dedicate it to his memory.

45

References 1. Guidelines for the conduct of research in the intramural program at NIH. https://oir.nih.gov/sites/default/files/uploads/sourcebook/ documents/ethical_conduct/guidelines-conduct_research.pdf. 2. NIH responsible conduct of research. http://researchethics.od.nih. gov. 3. NIH ethics cases. https://oir.nih.gov/sourcebook/ethical-conduct/ responsible-conduct-research-training/annual-review-ethics-casestudies. 4. NIH sourcebook. https://oir.nih.gov/sourcebook. 5. National Academy of Sciences. On being a scientist. 3rd ed. Washington, DC: National Academy Press; 2009. http://www.nap. edu/read/12192/chapter/1. 6. Resnik DB, Rasmussen LM, Kissling GE. An international study of research misconduct policies. Account Res 2015;22:249e66. http:// dx.doi.org/10.1080/08989621.2014.958218. 7. Anderson E, Solomon S, Heitman E, DuBois JM, Fisher CB, Kost RG, Lawless ME, Ramsey C, Jones B, Ammerman A, Friedman- Ross L. Research ethics education for community-engaged research: a review and research agenda. J Empir Res Hum Res Ethics 2012;7:3e19. http://dx.doi.org/10.1525/jer.2012.7.2.3. 8. Epigeum, Imperial College in London. https://www.epigeum. com/epigeum/. 9. Center for research ethics and bioethics, Uppsala University, Sweden. http://www.crb.uu.se/education/ethics-training/. 10. European Network of Research Ethics Committees. http://www. eurecnet.org/materials/index.html-m. 11. Responsible Conduct of Research Education Consortium. http:// www.indiana.edu/wappe/rcrec.html. 12. Poynter Center for the Study of Ethics and American Institutions. http://poynter.indiana.edu/index.shtml. 13. Office of Research Integrity. https://ori.hhs.gov/rcr-casebookstories-about-researchers-worth-discussing. 14. Institute of Medicine. The responsible conduct of research in the health sciences. Washington DC: National Academy Press; 1989. 15. Institute of Medicine. Integrity in scientific research, creating an environment that promotes responsible conduct. Washington DC: National Academy Press,; 2002. 16. NIH Responsible Conduct of Research Training. https://oir.nih. gov/sourcebook/ethical-conduct/responsible-conduct-researchtraining. 17. Fang EC, Steen RG, Casadevail A. Misconduct accounts for the majority of retracted scientific publications. PNAS 2012: 17029e33. http://dx.doi.org/10.1073/pnas.1212247109. 18. Retraction watch. http://retractionwatch.com. 19. Shuai X, Moulinier I, Rollins J, Custis T, Schilder F, Edmunds M. A multi-dimensional investigation of the effects of publication retraction on scholarly impact. 2016. http://arxiv.org/abs/1602.0912320. 20. PubPeer. https://pubpeer.com. 21. National Institutes of Health Intramural Research Program Policies & Procedures for Research Misconduct Proceedings. https://oir. nih.gov/sites/default/files/uploads/sourcebook/documents/ ethical_conduct/policy-nih_irp_research_misconduct_proceedings.pdf. 22. Bennett LM, Marais R, Gadlin H. The ‘Welcome letter’: a useful tool for laboratories and teams. J Transl Med Epidemiol 2012;2:1035. 23. McMahon T. Aligning expectations. In: Pfund C, House S, Asquith P, Spencer K, Silet K, Sorkness C, editors. Mentor training for clinical and translational researchers. Basingstoke, England: W. H. Freeman; 2012. p. 43e7. https://mentoringresources.ictr.wisc. edu/sites/default/files/McMahon_UW_Compact_Example.pdf. 24. Ramsey N. Working with normal ramsey, a guide for research students. Tufts University; 2014. https://www.cs.tufts.edu/wnr/students/ guide.pdf.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

46

3. INTEGRITY IN RESEARCH: INDIVIDUAL AND INSTITUTIONAL RESPONSIBILITY

25. AAMC. https://www.aamc.org/initiatives/research/gradcompact. 26. Martinson BC, Anderson MS, de Vries R. Scientists behaving badly. Nature 2005;435:737e8. 27. John LK, Lowenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci 2012;23:524e32. http://dx.doi.org/10.1177/0956797611430953. 28. Genomic data sharing. http://grants.nih.gov/grants/guide/ notice-files/NOT-OD-14-124.html. 29. NIH databases. https://gds.nih.gov/02dr2.html. 30. Clinical Trials website. https://clinicaltrials.gov. 31. NIH guide notice NOT-OD-15-019. https://grants.nih.gov/ grants/guide/notice-files/NOT-OD-15-019.html. 32. Office of Human Subjects Research Protections. http://ohsr.od. nih.gov/OHSR/pnppublic.php. 33. Responsible Conduct of Research for Clinical Scientists. https:// www.citiprogram.org/rcrpage.asp? language¼english&affiliation¼100. 34. Research Networking for Clinical and Translational Scientists Awards. https://ctsacentral.org/consortium/best-practices/researchnetworking/. 35. Guidelines for Human Biospecimen Storage and Tracking within the NIH Intramural Research Program. http://sourcebook.od. nih.gov/oversight/BiospecimenGuidelines.pdf. 36. Charo RSA. Body of research-ownership and use of human tissue. New Engl J Med 2006;355:1517e9. 37. Roche PA, Annas GJ. New genetic privacy concerns. GeneWatch 2007; 20(1). http://www.councilforresponsiblegenetics.org/GeneWatch/ GeneWatchPage.aspx?pageId¼196&archive¼yes. 38. Mascalzoni D, Dove ES, Rubinstein Y, Dawkins HJS, Kole A, McCormak P, Woods S, Riess O, Schaefer F, Lochmuller H, Knoppers BM, Hansson M. International charter of principles for sharing biospecimens and data. Euro J Hum Genet 2015;23: 712e28. http://dx.doi.org/10.1038/ejhg.2014.197. 39. Kaiser J. NIH to end all support for chimpanzee research. Science 2015. http://dx.doi.org/10.1126/science.aad7458. 40. Clayton JA, Collins FS. Policy: NIH to balance sex in cell and animal studies. Nature 2014;509:282e3. http://dx.doi.org/ 10.1038/509282a. 41. Bennett LM, Gadlin H, Levine-Finley S. Collaboration & team science: a field guide. 2010. http://TeamScience.nih.gov. 42. Technology transfer FAQs. http://www.ott.nih.gov/crada-mtafaqs. 43. Macrina FL. Scientific integrity: an introductory text with cases. Washington DC: ASM Press; 2000. 44. Stelfox HT, Chua G, O’Rourke K, Detsky AS. Conflict of interest in the debate over calcium-channel antagonists. N Engl J Med 1998; 338:101e6. 45. Blumenthal D, Campbell EG, Anderson MS, Causino N, Louis KS. Withholding research results in academic life science. Evidence from a national survey of faculty. JAMA 1997;277:1224e8.

46. A guide to avoiding financial and non-financial conflicts or perceived conflicts of interest in clinical research at NIH. https://oir.nih.gov/sites/ default/files/uploads/sourcebook/documents/ethical_conduct/ guide-avoiding_conflict_interest_clinical_research.pdf. 47. Siegal D. Myriad settles with quest in cancer gene test patent MDL, law 360. 2015. http://www.law360.com/articles/619520/myriadsettles-with-quest-in-cancer-gene-test-patent-mdl. 48. Ferguson C, Marcus A, Oransky I. Publishing: the peer-review scam. Nature 2014;515:480e2. 49. Bohannon J. Who’s afraid of peer review? Science 2013;342:60e5. http://dx.doi.org/10.1126/science.342.6154.60. 50. Beware of predatory publishers, NIH catalyst. JanuaryeFebruary 2015. 51. Procedure for authorship resolution. https://oir.nih.gov/sourcebook/ ethical-conduct/responsible-conduct-research-training/processesauthorship-dispute-resolution. 52. Flanagin A, Carey LA, Fontanarosa PB, Phillips SG, Pace BP, Lundberg GD, Rennie D. Prevalence of articles with honorary authors and ghost authors in peer-reviewed medical journals. JAMA 1998;280:222e4. 53. Baskin PK, Gross RA. Honorary and ghost authorship. BMJ 2011; 343:d6223. http://dx.doi.org/10.1136/bmj.d622331. 54. International Committee of Medical Journal Editors Uniform Requirements. http://www.ICMJE.org. 55. Wislar JS, Flanagin A, Fontanarosa PB, DeAngelis CD. Honorary and ghost authorship in high impact biomedical journals: a cross sectional survey. BMJ 2011;343:d6128. http://dx.doi.org/10.1136/ bmj.d6128. 56. JAMA. http://jama.jamanetwork.com/public/InstructionsForAuthors. aspx#dvTopNav; http://www.icmje.org/recommendations/. 57. Prinz F, Schlange T, Asadullah K. Nature rev. Drug Disc 2011;10: 712e3. 58. Collins FS, Tabak LA. Policy: NIH plans to enhance reproducibility. Nature 2014;505:612e3. http://dx.doi.org/10.1038/505612a. http://www.nature.com/news/policy-nih-plans-to-enhancereproducibility-1.14586. 59. Landis, et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature 2014;490:187e91. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3511845/. 60. Kilkenny C, Parsons N, Kadyszewski E, Fesitng MFW, Cuthill ID, Frey D, Hutton J, Altman DG. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One 2009;4:e7824. http://dx.doi.org/10.1371/ journal.pone.0007824.

Further Reading 1. Zinner DE, DesRoches CM, Bristol SJ, Clarridge B, Campbell EG. Tightening conflict-of-interest policies: the impact of 2005 ethics rules at the NIH. Acad Med 2010;85:1685e91.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

4 Institutional Review Boards 1

Julia Slutsman1, Lynnette Nieman2

National Institutes of Health, Washington, DC, United States; 2National Institutes of Health, Bethesda, MD, United States

O U T L I N E Continuing Review of Research

Historical, Ethical, and Regulatory Foundations of Current Requirements for Research Involving Human Subjects 47 Historical Foundations 47 Ethical Foundations 49 Regulatory Foundations 49 Institutional Review Boards Key Concepts and Definitions From the Common Rule Research Exempt Research Activities Minimal Risk and Expedited Review Procedures Institutional Review Board’s Review of Research Institutional Review Board Membership Criteria for Institutional Review Board Approval of Research

Clinical Researchers and Institutional Review Boards 57 Evaluation and Evolution of the Current System of Research Oversight and Institutional Review Boards 57 Proposed Changes to Current Oversight of Research With Human Subjects 57 Critique and Proposed Changes to Institutional Review Board Operations 58

50 50 51 51 51 51 51 52

Conclusion

59

Summary Questions

59

References

59

HISTORICAL, ETHICAL, AND REGULATORY FOUNDATIONS OF CURRENT REQUIREMENTS FOR RESEARCH INVOLVING HUMAN SUBJECTS

In the United States, the rights and welfare of human research subjects take precedence over the advance of scientific knowledge. Ethical guidelines, federal regulations, local institutional policies and procedures, and the knowledge and integrity of researchers and research staff all contribute to promoting the protection of human subjects. Our society has decided by law that objective, ongoing review of research activities by a group of diverse individuals is most likely to protect human subjects and promote ethically sound research. Prospective review of research by institutional review boards (IRBs) provides an important assurance that the rights and welfare of human subjects are given serious consideration. This chapter focuses on the development of US federal regulations concerning research involving human subjects and the roles and responsibili ties of IRBs. Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00004-6

56

Historical Foundations Concerns about the ethics of the practice of medicine have a long history, but until the mid-20th century, they were mostly centered on the practice of therapeutic medicine, not research medicine and are summarized in Chapter 2, but restated here in slightly different context for different and enhanced emphasis. In 1946, 23 Nazi physicians went on trial at Nuremberg for crimes committed against prisoners of war and inmates

47

Copyright © 2018. Published by Elsevier Inc.

48

4. INSTITUTIONAL REVIEW BOARDS

of concentration camps. These crimes included exposure of humans to extremes of temperature, performance of mutilating surgery, and deliberate infection with lethal pathogens. During the trial, fundamental ethical standards for the conduct of research involving humans were codified into the Nuremberg Code, which sets forth 10 conditions that must be met to justify research involving human subjects.1 Two important conditions are (1) the need for voluntary informed consent of subjects and (2) a scientifically valid research design that can produce fruitful results for the good of society. The Nuremberg Code was accepted in principle by 48 of 58 original signatory nations of the Charter of the United Nations as part of the Declaration of Human Rights. (Others abstained or did not vote.) However, in the United States, the existence of the Nuremberg Code was not widely appreciated. Researchers and physicians who were familiar with it generally believed that its requirements narrowly applied to research conducted by German researchers, and that it had little applicability or relevance to research conducted in the United States.2 In fact, full implementation of the first condition of the code in the United Statesdthe voluntary consent of subjects who are able to exercise free power of choicedwould have severely curtailed, if not eliminated, research involving prisoners, minors, and other individuals determined to lack capacity for providing informed consent. In the United States during the 1950s through the mid-1970s, many chemotherapeutic agents for cancer and other diseases/disorders were tested initially in healthy prisoners; in fact, some pharmaceutical companies had research buildings located on or near prisons to facilitate their research activities. Therefore, implementation of the code would have had major, dramatic effects on the conduct of research in the United States and in fact many significant changes to these practices would occur later on time. Most countries accepting the principles of the code, including the United States, had no mechanism for implementing its provisions. In 1953, the National Institutes of Health (NIH) opened the Clinical Center (CC), its major research hospital in Bethesda, Maryland, which subsequently developed the first US public policy for the protection of human subjects. The policy, which was applicable to intramural research at the NIH CC, required peer review of research protocols enrolling healthy subjects and was consistent with the Nuremberg Code in that it gave special emphasis to the protection of healthy, adult research volunteers who had little to gain directly from participation in research.3 The CC policy was innovative not only for its adoption but also for providing a mechanism for prospective review of research by individuals who had no direct involvement or intellectual investment in the research.

This was the beginning of the research review mechanismdthe IRBdthat is now fundamental to the current system of human subject protection throughout the United States. In fact, the first two research protocols submitted to the institutional review committee of the CC were disapproved because the committee judged that research-related risks to healthy volunteers were too high.4 However, the initial CC requirements for prospective review of research and obtaining subjects’ informed consent were applicable only to research involving healthy volunteers, not patients. In excluding research involving patients from these requirements, the policy was consistent with contemporaneous thinking of US physicians/researchers; most were reluctant to set forth explicit rules for the conduct of research involving patients, arguing that such rules would impede research and undermine trust in the physician.2 In the 1960s, increased pharmaceutical industry sponsored funding for clinical research expanded with the passage of the KefauvereHarris bill, which required manufacturers to perform research establishing safety and efficacy of drugs prior to marketing. The bill also required that subjects participating in research subject to the US Food and Drug Administration (FDA) purview must provide their informed consent prior to participation, becoming the first federal statute requiring protections for human subjects.3 Interest in the rights of research subjects grew not only because of a general increase in attention to human rights in the United States but also because of a number of highly publicized clinical research abuses. Indeed, the current laws that govern the conduct of federally funded research with human subjects were formulated in part in reaction to research abuses and scandals. In 1966, Henry Beecher, a highly respected physician/investigator from Harvard University, shocked the medical community when he reported that unethical and questionably ethical practices were common in the conduct of human subjects research at many of the premier research institutions in the United States.5 One of 22 examples of research misconduct that Beecher described involved investigators at the Jewish Chronic Disease Hospital in New York who injected elderly, indigent people with live cancer cells, without their consent, to learn more about the human immune system. Although no apparent harm to subjects were documented, the investigators were cited for fraud, deceit, and unprofessional conduct.6 In 1964 the World Health Organization recognized the need for research ethics guidelines that were broader in scope than the Nuremberg Code by adopting the “Declaration of Helsinki: Recommendations Guiding Medical Doctors in Biomedical Research Involving Human Subjects.”7 In the ensuing years these guidelines have been revised numerous times and are currently in use throughout the world.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

HISTORICAL, ETHICAL, AND REGULATORY FOUNDATIONS OF CURRENT REQUIREMENTS FOR RESEARCH INVOLVING HUMAN SUBJECTS

In 1966, the NIH, under the directorship of Dr. James Shannon, issued the first Public Health Service Policy on the Protection of Human Subjects.3 This policy, which applied to research conducted or supported by the then Department of Health, Education, and Welfare (HEW), including the NIH, required prospective review of human subjects research, taking into account the rights and welfare of involved subjects, the appropriateness of methods used to secure informed consent, and the risks and potential benefits of the research. The policy included the requirement that consent should be documented by the signatures of subjects or their representative(s). Several events in the early 1970s led to renewed and intense efforts in the United States to enhance protections for research participants by putting in place regulations governing human subjects research. Most notable was the revelation that, since the 1930s, 400 syphilitic and 200 healthy black men in Tuskegee, Alabama, had been involveddwithout their knowledged in the Tuskegee Syphilis Study, a US Public Health Service sponsored, decades-long study on the natural history of syphilis.8 These men were systematically denied penicillin even after its introduction as standard treatment for the disease. Beginning in 1971, The Health Subcommittee of the Senate Committee on Labor and Human Resources held hearings on this study and on other alleged health-care abuses of prisoners and children.3 Outcomes of these and additional hearings included (1) enactment of the National Research Act of 1974, requiring HEW to codify its policy for the protection of human subjects into federal regulations, which it did in 1974; (2) formation of the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research; and (3) imposition of a moratorium on research conducted or supported by HEW involving live human fetuses until the National Commission could study and make recommendations on this activity.4 The National Commission, which functioned from 1974 to 1978, evaluated existing HEW regulations, recommended improvements to the Secretary of HEW, and issued reports on research involving pregnant women, live human fetuses, prisoners, children, the mentally disabled, and the use of psychosurgery. The Commission issued The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research in 1979.5 This major advancement in the development of public policy provided guidance for distinguishing therapeutic medicine from research, identified an analytic framework of three fundamental ethical principles for the protection of human subjects, and illustrated how ethical principles should be applied to the conduct of human subjects research. In 1979, the HEW began to revise the 1974 regulations, and in January 1981 the (renamed) Department

49

of Health and Human Services (DHHS) gave final approval to the Title 45 section 46 Code of Federal Regulations (CFR) governing protection of human subjects (45 CFR 46).6 Initially, these regulations were applicable only when research was conducted or supported by DHHS, but in June of 1991, the core of the regulations (Subpart A)dreferred to as the Common Ruledwas adopted by 16 other federal department agencies.7

Ethical Foundations The ethical framework for US laws governing human subjects research protection is articulated in The Belmont Report. This document establishes three fundamental ethical principles that are relevant to all research involving human subjects: (1) respect for persons, (2) beneficence, and (3) justicedand demonstrates how they apply to the conduct of research involving human subjects.6 Respect for persons acknowledges the dignity and autonomy of individuals and requires that subjects give informed consent to participation in research. However, not all individuals are capable of selfdetermination, and The Belmont Report acknowledges that people with diminished autonomy are entitled to additional protections. For example, some individuals may need extensive protection, even to the point of excluding them from research activities that may harm them, whereas others require little protection beyond making sure that they undertake research freely, with awareness of possible adverse consequences.5 Beneficence requires that the benefits of research are maximized, possible harms minimized and that risks are assessed to be reasonable in relation to potential benefits. This principle underlies careful analysis by researchers and IRBs of the risks and benefits of research protocols.6 Justice requires fair selection and treatment of research subjects and a fair distribution of the risks and benefits of research. For example, subjects should be equitably chosen to ensure that certain individuals or classes of individuals are not systematically selected for or excluded from research, unless there are scientifically or ethically valid reasons for doing so. Also, unless there is careful justification for an exception, research should not involve people from groups that are unlikely to benefit directly or from subsequent applications of the research.6 These three principles are not mutually exclusive. Each principle carries strong moral force and difficult ethical questions arise with regard to balancing the principles when they come into conflict.

Regulatory Foundations Biomedical and behavioral human subjects research funded or supported by the DHHS, including the NIH,

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

50

4. INSTITUTIONAL REVIEW BOARDS

is under the purview of regulations for the protection of human subjects included in 45 CFR 46.7 These regulations embody the principles of The Belmont Report. Taken together, The Belmont Report and 45 CFR 46 articulate the minimal ethical standards and legal obligations of those who conduct, review, and oversee research. Also, regardless of the funding source, all clinical trials in the United States involving investigational drugs or devices are under the regulatory purview of the FDA, which endorses 45 CFR 46. Additional FDA regulations contained in Title 21 sections 50 and 56 CFR govern the development and approval of drugs, biologics, and devices, regardless of the funding source.8 FDA and DHHS regulations on the protection of human subjects and IRBs are generally consistent, although some differences have been noted.9 The regulatory apparatus of the DHHS for overseeing the protection of human subjects involved in the research that it funds consists of two major tiers of reviewdone at the federal level and the other at the institutional level. For example, as a condition for receipt of NIH research funds, institutions must assure in writing that personnel will abide by ethical principles of The Belmont Report and the requirements of 45 CFR 46. These attestations are referred to as federalwide assurances (FWAs) of compliance. FWAs are negotiated and approved by the Office for Human Research Protections (OHRP) on behalf of the Secretary of the DHHS. In March of 2011, OHRP held 8557 active assurances with entities in the United States and 2457 assurances with international entities (personal communication with OHRP). All assurances set forth the institution’s policies and procedures for review and monitoring of human subjects research activities, including IRB membership requirements and review and record-keeping procedures. A variety of administrative actions can be taken by OHRP for violation of the requirements of 45 CFR 46 or the terms and conditions of an institution’s assurance of compliance. OHRP has published analyses of compliance oversight investigations. Compliance oversight investigations conducted from 1990 through early-2000 resulted in restrictions of clinical research activities or corrective measures in 38 US research institutions. Corrective actions included temporary suspension of all DHHSfunded clinical research in some institutions, the requirement that some or all investigators conducting research in these institutions receive appropriate additional education concerning the protection of human subjects, and quarterly reports to the DHHS of the institution’s progress in correcting identified deficiencies.10 Compliance oversight investigations conducted from 2005 to the end of 2010 resulted in restrictions or suspension of research activities in three US research institutions. Actions included temporary suspension of all federally funded research in some institutions,

assignment of a new Authorized Institutional Official on the Assurance, and the requirement to rereview all DHHS or US. federally supported human subject protocols that had not undergone adequate continuing review.11 From 2011 to May of 2016, OHRP has sent 71 compliance oversight determination letters. OHRP’s website provides letters of compliance oversight determination organized by date as well as by the type of noncompliance action addressed (for more recent years).12 Some clinical research conducted in the United States does not fall under federal regulations because it is not funded by the federal government, or because it does not involve compounds under the jurisdiction of the FDA. This includes some research conducted in educational environments (such as colleges and universities) and other nonfederally funded research. The quantity of such research and the settings in which it is being conducted are not known. Efforts were made to bring all US clinical research under the purview of federal regulations, however such provisions were not added to the Final Rule published on January 19, 2017. When filing an FWA, individual US institutions can voluntarily indicate to DHHS that they will apply to the rule to all human subjects research that they conductdregardless of the funding source. It is important to note that in addition to the regulation at 45 CFR 46, many states and institutions have additional laws and requirements that apply to human subjects research.

INSTITUTIONAL REVIEW BOARDS Key Concepts and Definitions From the Common Rule Because clinical investigators seek generalizable knowledge applicable to persons other than their individual patients, the pursuit of this goal may not always promote the welfare of individual patients. Accordingly, DHHS and FDA regulations require most proposed clinical research to undergo prospective independent review by an IRB as a mechanism for promoting ethically sound research. Although the IRB system is not perfect, conscientious IRBs reassure the US public that people not directly involved in the research consider seriously the rights and welfare of human subjects before research may begin. It is through this process of research review and approval that investigators, research institutions, IRB members, and others are held publicly accountable for their decisions and actions. The federal regulations at 45 CFR 46 include definitions of terms, recordkeeping requirements, composition of IRBs, and responsibilities and requirements for IRB review and approval of research involving human subjects.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

INSTITUTIONAL REVIEW BOARDS

Research Research is any systematic investigation designed to develop or contribute to generalizable knowledge (45 CFR 46.102[d]). A human subject is a living individual about whom an investigator obtains (1) data through interaction or intervention with the individual or (2) identifiable private information (45 CFR 46.102[f]). For example, consider the situation in which a physician asks the hospital medical records department to make available for review the medical records of all patients with a diagnosis of human immunodeficiency virus (HIV) infection. The physician wants to learn about the medical management of these patients treated in the hospital and its clinics during the past 5 years and to publish an analysis of that management. According to the preceding definitions, if the physician reviews medical records of patients who are no longer living, he or she is conducting research, but it does not involve human subjects (defined as living individuals). However, if the physician reviews medical records of patients who are still living, he or she is conducting research involving human subjects.

Exempt Research Activities Not all research involving human subjects requires prospective IRB review and approval. Although they involve human subjects, six categories of research are exempt from the requirements of 45 CFR 46 for IRB review in the version of the Common Rule in effect as of the writing of this Chapter. Changes and additional exemption categories are described in the changes to the Common Rule published on January 17, 2017.12a The general rationale behind the six categories of exemptions is that although the research involves human subjects, it does not expose them to physical, social, psychological, or other risks beyond those encountered in daily life. One example of exempt research is the study of existing records (e.g., pathologic samples, medical records) if these sources are publicly available, or if the investigator records the information in such a way that subjects cannot be identified directly or through identifiers linked to the subjects. Therefore, in the previous example, in which the researcher wants to study existing medical records, the research may be exempt from the requirement for IRB review and approval if the researcher records information from the medical charts in an anonymous fashion (no links or codes identifying patients). However, many hospitals have more restrictive policies concerning the research use of medical records and pathologic samples, and researchers should be familiar with relevant institutional policies. Survey and questionnaire research may be exempt unless the information elicited, if disclosed outside the research, could reasonably place subjects at risk for criminal or civil liability or could be damaging to subjects’

51

financial standing, employability, or reputation. Therefore, a questionnaire or survey should not be exempted if, for example, it elicits information about illegal behaviors, such as drug use, child or spousal abuse, or other sensitive issues such as sexual and other private behaviors. Institutional procedures vary for making determinations about whether proposed research activities are exempt. Investigators are not authorized to make final determinations about whether their proposed research activities are exempt from the requirement for prospective IRB review and approval. In some institutions, the IRB makes these determinations; in others, an office for research regulation or its equivalent makes these determinations.13 Thus, researchers should be familiar with their institution’s procedures for requesting exemptions.

Minimal Risk and Expedited Review Procedures Minimal risk means that “the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life or the performance of routine physical or psychological examinations or tests” (45 CFR 46.102(i)). Some minimal risk research activities are eligible for IRB review through expedited review procedures. This means that the IRB chair and/or other experienced IRB members designated by the chair may approve (but not disapprove) the research on behalf of the IRB. The expedited review process was put into place to streamline and facilitate IRB review of certain minimal risk research activities. The changes to the Common Rule published on January 17, 2017, known as the Final Rule, also include an additional category of IRB review, called limited IRB review, which can apply to certain exempt and nonexempt research.

Institutional Review Board’s Review of Research When a researcher proposes to do research that is not exempt, he or she submits a research protocol for review. If a decision cannot be expedited, it must be reviewed by a convened IRB. A protocol is the researcher’s written description of the planned research, as well as a discussion of issues related to the protection of subjects. The following sections provide some of the regulatory requirements for IRB composition and criteria for IRB review and approval of research involving human subjects. Institutional Review Board Membership Federal regulations set minimal IRB membership standards for review of research under the Common Rule. All IRBs must have at least five members (45 CR 46.107): at least one whose primary concerns are in scientific areas, one whose primary concerns are in nonscientific areas, and one who is not otherwise affiliated with the institution. Also, when in its judgment

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

52

4. INSTITUTIONAL REVIEW BOARDS

the IRB requires expertise beyond or in addition to that available through its members, it may invite individuals with competence in special areas to assist in its reviews. These requirements for membership are grounded in the belief that the protection of human subjects is promoted by an objective review of research activities by a group of diverse individuals who have no direct involvement in the research. Because IRBs are often situated at or near the site of the research, members are expected to have knowledge of and sensitivity to specific conditions affecting the conduct of the research and the protection of participants in geographic communities proximal to the research institution. For example, if a US-sponsored research protocol will enroll subjects in other countries, there, rather than, or in addition to, US IRBs may be important to help identify and resolve particular cultural, religious, or other matters related to the research. Criteria for Institutional Review Board Approval of Research To approve research, an IRB must determine that it meets minimal requirements. Table 4.1 includes the seven minimal regulatory criteria for IRB review and approval (with citations to their sources in 45 CFR 46.111) along with questions that IRBs in the Intramural Research Program (IRP) at the NIH often consider when reviewing research protocols. All clinical researchers, particularly principal investigators (PIs), must be familiar with these requirements and must understand how they apply to their research protocols. In addition, the human subjects protection regulation at 45 CFR 46, state laws, guidance documents from OHRP and published frameworks for ethical review of research provide a broad range of resources for investigators and IRBs. 1. The proposed research design is Scientifically Sound and Will Not Unnecessarily Expose Subjects to Risk (Table 4.1, #1). At a minimum, the IRB should determine that the hypothesis is clear, and that the study design is appropriate. If a research protocol is poorly designed and investigators are not likely to obtain meaningful information, it is not ethically justifiable to expose subjects to any risk, discomfort, or inconvenience.18 However, although IRBs have some members with scientific expertise, they are not constituted to act as primary scientific review committees. In many institutions, protocols undergo pre-IRB scientific review to ensure that protocols sent to the IRB are likely to yield scientifically meaningful results. This is a desirable approach because it allows the IRB to focus its major attention on the protection of subjects. In any event, an IRB should not approve a research protocol that it does not believe to be scientifically sound.

2. Risks are Reasonable in Relation to Anticipated Benefits (Table 4.1, #2). The IRB is required to determine the risks, discomforts, burdens, and benefits of participation in the protocol under consideration. Investigators should describe the informed consent form in the protocol and should describe the risks and benefits of the research in the protocol and informed consent form. Risk is the probability of harm or injury (types of risk include physical, psychological, social, and economic) occurring as a result of participation in a research study. Risk varies in magnitude, but only minimal risk is defined by federal regulations. IRBs are required to assess research-related benefits. The term benefit is not defined in the regulations. Generally, the benefits of research fall into two major categories: (1) direct benefits to individual subjects, for example, in the form of health improvements from the intervention being studied (cure or diminution of symptoms of a disease/ disorder), and (2) benefits to others (e.g., society at large and future patients) because of advancements of knowledge through research.19 To approve research, an IRB must determine that “risks to subjects are reasonable in relation to anticipated benefits, if any, to subjects and the importance of the knowledge that may reasonably be expected to result” (45 CFR 46.111(2)). If research subjects stand to benefit directly from participation in research, because they are receiving treatment or diagnostic procedures, higher risks and discomforts may be justifiable. On the other hand, in research for which there is no prospect of direct benefit to individual subjects, such as research involving healthy volunteers, the IRB must evaluate whether risks to subjects presented by research-related procedures/interventions solely to obtain generalizable knowledge are ethically acceptable. For example, in the IRP of the NIH, IRBs are expected to categorize research-related benefits and risks according to the criteria in Table 4.2. 3. Risks to Subjects are Minimized (Table 4.1, #6). Efforts of the IRB to minimize risks are closely related to, and most likely will be discussed along with, criteria #1 and #2, above. Even when research risks are justifiable and unavoidable, they often can be reduced or managed effectively. IRBs are responsible for assuring that risks are minimized to the extent possible. Ways to minimize risks may include, but are not limited to, assuring that (1) adequate safeguards are incorporated into the protocol that reduce the probability and/or severity of harm(s), (2) monitoring of participant safety and data integrity is appropriate, and (3) investigators are competent in the area(s) being studied. Data safety and monitoring by an

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

INSTITUTIONAL REVIEW BOARDS

53

TABLE 4.1 Institutional Review Board (IRB) Protocol Review Standards: Regulatory Requirements for IRB Review and Documentation in the Minutes Regulatory Review Requirements (46.111)

Possible Questions for IRB Discussion

1. The proposed research design is scientifically sound and will not unnecessarily expose subjects to risk. (45 CFR 46.111(a)(1)(i))

a. Is the hypothesis clear? Is it clearly stated? b. Is the study design appropriate? c. Will the research contribute to generalizable knowledge? Is it ethically permissible to expose subjects to risk?

2. Risks to subjects are reasonable in relation to anticipated benefits, if any, to subjects and the importance of knowledge that may reasonably be expected to result. (45 CFR 46.111(a)(2))

a. What does the IRB consider the level or risk to be? (See risk assessment in Table 5.2.) b. What does the principal investigator consider the level of risk/ discomfort/inconvenience to be? c. Is there prospect of direct benefit to subjects? (See benefit assessment in Table 5.2.)

3. Subject selection is equitable. (45 CFR 46.111(a)(3))

a. Who is to be enrolled? Men? Women? Ethnic minorities? Children (rationale for inclusion/exclusion addressed)? Seriously ill persons? Healthy volunteers? b. Are these subjects appropriate for the protocol?

4. Additional safeguards are required for subjects likely to be vulnerable to coercion or undue influence. (45 CFR 46.111(b))

a. Are appropriate protections in place for vulnerable subjects (e.g., pregnant women, fetuses, socially or economically disadvantaged, decisionally impaired)?

5. Informed consent is obtained from research subjects or their legally authorized representative(s). (45 CFR 46.111(a)(4)) Informed consent will be appropriately documented. (45 CFR 46.111(a)(5)

a. Does the informed consent document include the eight required elements? b. Is the consent document understandable to subjects? c. Who will obtain informed consent (principal investigator, nurse, or other)? In what setting? d. If appropriate, is there a children’s assent? e. Is the IRB requested to waive or alter any informed consent requirement?

6. Risks to subjects are minimized. (45 CFR 46.111(a)(1)) “When appropriate, the research plan makes adequate provision for monitoring the data collected to ensure the safety of subjects.” (45 CFR 46.111(a)(5)

a. Does the research design minimize risks to subjects? b. Would use of a data and safety monitoring board or other research oversight process enhance subject safety?

7. Subject privacy and data confidentiality are maximized. (45 CFR 46.111(a)(7))

a. Will personally identifiable research data be protected to the extent possible from access or use? b. Are any special privacy and confidentiality issues (e.g., use of genetic information) properly addressed?

This table describes regulatory requirements for IRB review of research studies and suggested discussion question for each requirement.

independent person or committee (DSMC) may be appropriate. It is important to assure that IRBs communicate effectively with DSMCs.19 4. Subject Selection is Equitable (Table 4.1, #3). The ethical principle of justice, which requires fair distribution of both the burdens and the benefits of research, underlies the requirement for equitable selection of research subjects. On the one hand, when the NIH funds research, it expects the findings to be of benefit to all persons at risk for the disease, disorder, or condition under study. NIH also requires participation of women and minorities (see Chapter 13), as these populations have historically been underrepresented in clinical research20 and because there are relevant physiological differences and differences in prevalence rates of conditions between the sexes and across ethnic groups “that can affect how disease and treatment manifest themselves.”21

On the other hand, IRBs are required to ensure that subjects (e.g., indigent persons, racial and ethnic minorities, persons confined to institutions, individuals who are socially or politically disenfranchised) are not being systematically selected merely because of their vulnerability, easy availability, their susceptibility to undue influence, their compromised position, or their manipulability, rather than for reasons directly related to the goals and questions of the research. When defining the appropriate group of subjects to be studied in a research protocol, researchers take into account: scientific design, susceptibility to risk of potential subjects (selecting subjects in a way that minimizes risk), the likelihood of direct benefits to subjects and society, and considerations of practicability and fairness. If vulnerable groups of subjects are included in the research, the IRB should determine that

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

54 TABLE 4.2

4. INSTITUTIONAL REVIEW BOARDS

Template Institutional Review Board (IRB) Assessment of Research-Related Risks and Benefits

RISK Definition of minimal risk: Minimal risk means that the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests [45 CRF 46.102(i)]. What is the appropriate risk category for the protocol under consideration? • The research involves no more than minimal risk to subjects. • The research involves more than minimal risk to subjects. • The risk(s) represents a minor increase over minimal risk. • The risk(s) represents more than a minor increases over minimal risk. BENEFIT Definition: A research benefit is considered to be something of healthrelated, psychosocial, or other value to an individual research subject, or something that will contribute to the acquisition of generalizable knowledge. Money or other compensation for participation in research is not considered to be a benefit but, rather, compensation for researchrelated inconveniences. The appropriate benefit category for the under considerations is • The research involves no prospect of direct benefit to individual but is likely to yield generalizable knowledge about the subject’s disorder or condition. • The research involves the prospect of direct benefit to individual subjects. This template provides a framework for IRBs to assess the risks and benefits associated with a given protocol.

protections for these groups are adequate; such safeguards could include consent monitoring, capacity assessment, and adequate data and safety monitoring. IRBs are expected to determine that subject selection as proposed by the researcher in his or her research protocol is scientifically justified and ethically appropriate. 5. Informed Consent is Provided by Research Subjects or Their Legally Authorized Representative(s) (Table 4.1, #5). Although the requirement to obtain informed consent has substantial foundations in law, it is essentially an ethical imperative. It is through informed consent that researchers make operational their duty to respect the rights of prospective subjects to be self-determining, for example to be left alone, to make free choices consistent with the values that are meaningful to them, and to have private information about them shared only in ways for which they give permission.22 However, in practical terms, signing the consent document is the only one element in a subject’s decision-making process about participating in a research protocol. The decision-making process of

prospective subjects can be influenced by various factors, including (1) the written consent document and discussion about the study with the investigator, (2) the knowledge and skills of professionals involved in the process and their relationship to potential subjects (e.g., researchers, nurses), (3) the prospective research subject (e.g., his or her medical and emotional state, primary language, ethnic/cultural background, financial considerations, other personal factors), and (4) the circumstances in which the process takes place (e.g., an emergency room, private practice setting, academic institution). IRBs spend considerable time reviewing the written informed consent document(s). The role of the IRB is to ensure that the consent document contains required elements of consent (Table 4.3) and that it is written at a reading level, and in a format, understandable to prospective subjects. The Final Rule adds additional requirements for informed consent. In addition to reviewing the consent document, IRBs can influence the informed consent process by ensuring that individuals obtaining consent are qualified to take on this important responsibility. For example, an IRB should take into consideration who will obtain informed consent to participate in the protocol and under what circumstances. Depending on the complexity and TABLE 4.3 General Requirements for Informed Consent (45 CFR 46.116) In seeking informed consent, the following information shall be provided to each subject: 1. A statement that the study involves research, and an explanation of the purposes of the research; the expected duration of the subject’s participation; a description of procedures to be followed; and identification of any procedures that are experimental. 2. A description of any foreseeable risks or discomforts to the subject. 3. A description of any benefits to subjects or to others that may reasonably be expected from the research. 4. A disclosure of appropriate alternative procedures or courses of treatment, if any, that might be advantageous to the subject. 5. A statement describing the extent, if any, to which confidentiality of records identifying the subject will be maintained. 6. For research involving greater than minimal risk, an explanation as to whether any compensation and an explanation as to whether any medical treatments are available if injury occurs and, if so, what hey consists of and where further information may be obtained. 7. An explanation of whom to contact for answers to pertinent questions about the research and research subjects’ rights, and whom to contact in the event of a research-related injury to the subjects. 8. A statement that participants is voluntary, refusal to participate will involve no penalty or loss of benefits to which the subject is otherwise entitled, and the subject may discontinue participation at any time without penalty or loss of benefits to which the subject is otherwise entitled. This table summarizes regulatory requirements for informed consent forms.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

INSTITUTIONAL REVIEW BOARDS

risks associated with a research study, an experienced senior researcher, rather than a junior person, may be required to obtain consent. Also, IRBs may exercise their authority to observe or request a third party to observe the consent process and the conduct of the research (45 CFR 46.109(e)). Obtaining informed consent for research participation is a complex process; therefore, it has been a topic of interest, discussion, and publication for many years. In 1966, Dr. Henry Beecher, refer to Chapters 2 and 3, wrote that the two most important elements in ethical research involving human subjects are informed consent (which he acknowledged in some cases was difficult, if not impossible, to obtain) and the “presence of an intelligent, informed, conscientious, compassionate, responsible investigator.”15 His ideas still ring true today. Even though the role of the IRB in promoting subjects’ informed consent is important, it is primarily the responsibility of the investigator who is obtaining the consent to ensure that it is, in fact, informed and valid. 6. Additional Safeguards are Required for Subjects Likely to be Vulnerable to Coercion or Undue Influence (Table 4.1, #4). Vulnerable research subjects are individuals who are relatively or absolutely incapable of protecting their interests. In other words, “they have insufficient power, prowess, intelligence, resources, strength, or needed attributes to protect their own interests through negotiations for informed consent.”23 Vulnerable subjects do not represent a homogeneous group but rather fall into heterogeneous groups whose participation in research may require additional protections. Table 4.4 is a noninclusive list of vulnerable or potentially vulnerable research subjects. It lists individuals who TABLE 4.4 Vulnerable (or Potentially Vulnerable) Research Subjects This is a noninclusive list of common categories of research subjects who have limitations to their capacity to provide informed consent and/or who may be susceptible to coercion or undue influence in decisions about research participation: Comatose people Critically ill people Mentally retarded people/people with dementias/some psychiatric diseases Children NoneEnglish speaking people Educationally/economically deprived people Prisoners Seriously/terminally ill people Paid research volunteers This table enumerates common categories of vulnerable subjects (e.g., subjects who have limitations to their capacity to provide informed consent and/or who may be susceptible to coercion).

55

have no, or limited, capacity to provide informed consent, as well as those who may be particularly susceptible to undue influence or coercion. Federal regulations direct IRBs to ensure that when some or all subjects are likely to be vulnerable to coercion or undue influence, such as children, prisoners, pregnant women, mentally disabled persons, or economically or educationally disadvantaged persons; additional safeguards have been included in the study to protect the rights and welfare of these subjects (45 CFR 46.111(b)). However, little additional practical guidance is provided, except when subjects of research are pregnant women (45 CFR 46, Subpart B), prisoners (Subpart C), and children (Subpart D). Recent guidance on research involving prisoners and children has been useful to IRBs in reviewing such research.24,25 Otherwise, IRBs, in consultation with investigators, are expected to determine when subjects are likely to be vulnerable to coercion or undue influence and therefore at increased risk of acting against their own best interests and to provide additional safeguards appropriate to the particular research protocol under consideration. For example, persons suffering from prolonged or serious illnesses that are refractory to standard therapies, or for which no standard therapies are available, should be considered vulnerable. Although these sick individuals may have the intellectual capacity to provide informed consent, attention must be paid to the validity of the consent. Because of limited choices, out of desperation they may be willing to take serious risks, even for a highly remote prospect of direct benefit. Although this is not necessarily inappropriate, researchers and IRBs need to give careful attention to the informed consent process in protocols studying terminally ill or very sick people. To evaluate the validity of the consent, the IRB may ask that an “uninterested” individual, such as a social worker, a physician not involved in the research, or a research subject advocate, discuss the research study and other clinical or research alternatives with prospective subjects.26 Attention has been given to additional protections for research involving individuals with mental disorders,27 research involving cognitively impaired subjects,28 and research conducted in emergency circumstances.29 7. Subject Privacy and Confidentiality Are Maximized (Table 4.1, #7). Confidentiality refers to the management of information that an individual has disclosed in a relationship of trust; an expectation is that it will not be divulged to others in ways that are inconsistent with an understanding of original disclosure without the person’s permission. Privacy is defined in terms of having control over the extent, timing, and

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

56

4. INSTITUTIONAL REVIEW BOARDS

circumstances of sharing information about oneself (physical, intellectual, or behavioral) with others. Biomedical and behavioral research may invade the privacy of individuals or may result in a breach of data confidentiality. In certain circumstances, a breach of confidentiality may present a risk of serious harm to subjects, for example, when a researcher obtains information about subjects that, if disclosed by the researcher, would jeopardize their jobs or lead to their prosecution for criminal behavior. In other circumstances, such as observation and recording of public behavior, the invasion of privacy may present little or no harm. However, the need for confidentiality exists in virtually all studies in which data are collected about identified subjects.30 In most research, ensuring confidentiality is a matter of following best practices for handling paper and electronic records and also biological samples. While it is not possible to absolutely protect privacy or maintain confidentiality, researchers should be aware of evolving best practices and implement up to date security measures to protect data confidentiality. Common practices include identifying private locations for conducting subject assessments and interviews, substituting codes for personal identifiers, properly disposing of computer sheets and other papers with confidential identified information, deidentifying data, limiting access to identifiable data in a manner consistent with applicable regulations and laws (such as the Privacy Act of 1972 and the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule), and/or storing research records in locked cabinets and in secure and encrypted electronic media. Most researchers are familiar with these routine precautions taken to maintain the confidentiality of data. At a minimum, IRBs should assure themselves that adequate protections will be taken to safeguard the confidentiality of research information to the extent possible. The types and stringency of measures depend on the type of information to be gathered in the study. In any case, guarantees of “absolute” confidentiality should be avoided; in fact, the limits of confidentiality should be clarified. For example, federal officials have the right to inspect research records, including informed consent documents and individual medical records, to ensure compliance with the rules and standards for their programs (e.g., FDA inspections of clinical trial records). More elaborate procedures may be needed in studies in which data are collected on sensitive matters such as sexual behavior, criminal activities, and genetic predilection to disease and studies where social media or Web-based tools are used for data collection. Other federal, state, or local laws address the confidentiality and maintenance of protected health information (PHI). For example, in the research context the

HIPAA Privacy Rule gives patients certain rights over their health information and sets rules and limits on who can look at and receive this information (such as the right to provide an authorization for sharing of data unless a waiver of authorization has been approved by a Privacy Board or IRB). HIPAA, also referred to as the “Privacy Rule,” was a federal legislative response to public concern over potential abuses of the privacy of health information in medical care. The Privacy Rule establishes a category of health information, referred to as PHI, which may be used or disclosed to others only under certain conditions. PHI includes what health-care professionals typically regard as a patient’s personal health information, such as information in a patient’s medical chart. The rule applies to identifiable health information about subjects of clinical research gathered by researchers who qualify as “covered health-care providers.” Therefore, researchers associated with entities covered under HIPAA must understand its requirements to protect the confidentiality of research subjects.31 Continuing Review of Research IRBs are required to conduct continuing review of approved research at least annually, or sooner if they determine that the research presents significant physical, social, or psychological risks to subjects (45 CFR 46.109(e)). Continuing review is required to assure IRBs, investigators, research subjects, and the public that ongoing assessment will protect the rights and welfare of subjects. Requirements regarding what information investigators must submit to an IRB at the time of its continuing review vary according to institutional requirements and whether any subjects continue to be seen. For example, in the IRP of the NIH, investigators are required to submit materials including a copy of the currently approved protocol consent document; a concise summary of the protocol’s progress to date; the reason(s) for continuing the study; the gender/ethnic breakdown of subjects recruited to date; and any scientific developments that bear on the protocol, especially those that deal with risk(s), burdens, or benefits to individual subjects. Also, at the time of continuing review, protocol investigators must report any new equity, consultative, or other relationships with non-NIH entities that might present a real or apparent conflict of interest in the conduct of the protocol. By contrast, if no subjects continue to be seen, a more abbreviated and expedited process is possible. At its continuing review, or at any other time, an IRB may suspend, modify, or terminate approval of research that has been associated with serious harm to subjects or is not being conducted in accord with federal regulatory requirements, ethical guidelines, and/or institutional policies. The Final Rule allows IRBs to not require continuing review for research that is no greater than minimal risk.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

EVALUATION AND EVOLUTION OF THE CURRENT SYSTEM OF RESEARCH OVERSIGHT AND INSTITUTIONAL REVIEW BOARDS

CLINICAL RESEARCHERS AND INSTITUTIONAL REVIEW BOARDS Successful clinical researchers know that strong ethical practices go hand in hand with high-quality scientifically valid research involving human subjects. These researchers understand the IRB’s mandate to protect human subjects and strive to work effectively with the IRB. Researchers’ knowledge of and expertise in the ethical dimensions of their research activities are important to IRBs for several reasons. First, clinical researchers can help educate IRBs about human subject protection issues related to their research protocols. It is helpful to IRBs in understanding and resolving human subject protection issues if PIs are knowledgeable about the IRB review standards and are expert in applying them to their protocols. For example, when writing a protocol to test an investigational drug in persons with Alzheimer’s disease, the researcher should provide clear scientific justification in the protocol for including individuals with cognitive impairment in research. The investigator should describe procedures for (1) assessing whether subjects have the capacity to provide consent, (2) selecting a legally authorized representative to give permission for subjects who cannot provide consent, and (3) any additional protections afforded to subjects. The PI may propose that a person otherwise not involved in the research should monitor the informed consent process to ensure that subjects and/or their representatives understand the investigational nature of the study and its risks. This approach assists the IRB greatly by providing it with a thorough overview of human subject protection issues specific to the protocol under review, along with a description of measures proposed by the PI to resolve these issues. Second, in the early phases of scientifically innovative research, ethical and human subject protection issues may be unique and/or unclear; researchers who are experts in the scientific and ethical aspects of their research can provide IRBs with invaluable guidance in areas of uncertainty. IRB decisions are matters of judgment, and when highly innovative research is reviewed, it is particularly important that such judgments take into account relevant ethical thinking and scientific knowledge. Increasingly, institutions conducting biomedical research have processes for research ethics consultation, similar to clinical ethics consultation to support stakeholders in research studies. The NIH IRP was among the first institutions in the country to establish a clinical research ethics consultation service and has published approaches and cases in such consultation.32

57

EVALUATION AND EVOLUTION OF THE CURRENT SYSTEM OF RESEARCH OVERSIGHT AND INSTITUTIONAL REVIEW BOARDS Proposed Changes to Current Oversight of Research With Human Subjects In the past 30 plus years, since the current research system was put into place, the landscape of human subjects research has changed considerably. Funding sources have changed. For example, the pharmaceutical industry’s share of total funding of biomedical research has increased from 32% in 1980 to 62% in the early 2000s, whereas the federal government’s share has fallen.33 There has been an increase in multisite studies as opposed to single-institution studies that were the norm when the research oversight system was conceptualized. Because of globalization and advances in technology, there is a greater diversity of research including more health services and comparative effectiveness research, international research, biospecimen research utilizing ever advancing analytic tools and “big data” research leveraging data from across studies and sources.34,35 Concerns about sufficient oversight and the potential for institutional and investigator conflict of interest were voiced in the wake of the deaths of a research subject in a gene therapy trial and a healthy volunteer in an asthma challenge study.36,37 The US Government Accountability Office (GAO) and the Institute of Medicine (IOM) conducted formal evaluations of the human subject protections oversight system, including the function of IRBs, and the efficiency and effectiveness of the current system have come under scrutiny and critique by academicians.32,38 Concerns raised included “(1) structural problems deriving from the organization of the system as established by federal regulations, (2) procedural problems stemming from the ways in which IRBs operate, and (3) performance assessment problems from the systematic assessment of current protections.”39 Recommendations made by the GAO and IOM were aimed at improving the education of researchers, IRB members, and institutional officials overseeing research involving human subjects; ensuring that IRBs have sufficient time and resources for adequate review; and strengthening federal oversight of research. Others have called for more significant changes, including establishment of a single regulatory office for all human subjects research conducted in the United States.39 In response to numerous proposals for reform of the system, OHRP promulgated an Advance Notice of

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

58

4. INSTITUTIONAL REVIEW BOARDS

Propose Rule Making (ANPRM), “Human Subjects Research Protections: Enhancing Protections for Research Subjects and Reducing Burden, Delay and Ambiguity for Investigators.”40 OHRP subsequently published a Notice of Public Rulemaking in 2014.41 The NPRM was revised based on the 2186 comments received and the Final Rule was published on January 19, 2017. Its provisions go into effect on January 19, 2018, except for the requirement for single IRB review of multisite studies which goes into effect one year later. The Final Rule proposes changes in eight broad areas: (1) adding additional requirements for informed consent forms and processes; (2) allowing the use of broad informed consent for secondary analyses of stored biospecimens; (3) creating a category of IRB review called limited IRB review, which can take multiple forms; (4) adding additional exemption categories and allowing exemption determinations to be made without administrative, institutional, or IRB review; (5) modifications to requirements for waiver or alteration of consent for biospecimen research to make such mechanisms less common; (6) mandate the use of single IRBs for review of domestic multisite studies (with provision for certain exemptions); (7) eliminate the requirement for continuing review for studies that are limited to data analysis or limited observational follow up activities where review is thought to not meaningfully contribute to the protection of human subjects; (8) additional requirements, designed to increase subject understanding, for informed consent forms.44 One of the most immediate and dramatic changes is the requirement for single IRB review of multisite studies. Review of multisite research studies by IRBs at each site has been cited as a common impediment to the efficient conduct of multisite research. Some have argued that this review is redundant, impedes the timely conduct of research, increases the complexity of informed consent forms and does not enhance the protections of human subjects.16,45e48 Another disadvantage of institutionally based IRBs is the potential for IRBs to focus on the interests of the institution as opposed to those of participants, thereby generating potential for conflict of interest.39 The regulations at 45 CFR 46.114 allow an institution to cede IRB review for a study conducted under its FWA to another institution, and reliance on a single or central IRB for multisite research is becoming increasingly common. NIH has published a policy requiring single IRB review for most NIH-funded multisite studies.49 There remains resistance among institutions, IRB members and investigators to use of a single IRB, especially one that is not local to the community being studied.50,51 There is some experience with multiple models of central IRB review, and while data are limited there is some preliminary evidence to suggest that efficiency in review time and potentially costs may be possible with mature, well-structured central IRB operations.52e55

Critique and Proposed Changes to Institutional Review Board Operations The function of the IRB system, distinct from the overall research oversight system, is currently under considerable criticism.56 Some claim “that IRBs should be radically overhauled.”57 The current IRB system deserves serious reevaluation; its strengths should be acknowledged and supported, and its weaknesses should be addressed.58e60 One challenge to broader reform of IRB operations is a lack of empirical studies and established performance measures for the quality of IRB review. Despite their long-standing central role for the protection of human subjects, relatively little published research has explored their deliberations.61,62 Most studies examined only IRB records and procedures and IRB members’ knowledge and attitudes; little published work has evaluated the protocol review activities of IRBs as conducted in their convened meetings. One approach to achieving consistency and potentially high-quality IRB review is to undergo review by an outside entity according to published standards. The IOM encouraged the development of such institutional accreditation standards that build on federal regulations and urged that accrediting organizations be nongovernmental entities.39 The Association for the Accreditation of Human Research Protection Programs (AAHRPP) is the main accrediting organization in the United States (see Chapter 5). Its standards indicate that IRBs are one of several important elements (or domains) in an institution’s overall Human Research Protection Program (HRPP). Other critical domains addressed by these standards are the roles and responsibilities of the institution, institutional/organizational officials, researchers, research staff, and research subjects. The process of accreditation includes organizational self-evaluation and site visits by independent human subject protection experts. As of June 2014, greater than 60% of US research intensive institutions and 65% of US medical schools, as well as pharmaceutical companies, independent IRBs, and international institutions have received or have begun application for full accreditation by AAHRPP.63 Interestingly, some of the strengths of the IRB system also contribute to its potential weaknesses. For example, having IRBs situated at the site of the research provides the advantage that research is reviewed by people most likely to be familiar with the researchers and with institutional and other local factors relevant to the protection of research participants. However, if IRB members are predominantly employees of the research institution, there is the potential for conflict of interest, particularly when reviewing research protocols involving large amounts of grant or other support money to the institution. Many organizations take steps to address and

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

59

REFERENCES

minimize these real or potential conflicts of interest of IRB members.38 One strength of IRBs situated in research institutions is that IRB can have an important educational role within the organization.32 For example, the NIH IRP has 12 IRBs consisting of approximately 200 members who provide a significant educational resource to the NIH research community. The ability of an IRB to fulfill its mandate is influenced by several factors, including the knowledge and experience of members and institutional resources and commitment. IRB decisions are matters of judgment and therefore depend on an understanding and wise application of ethical guidelines and regulatory requirements, as well as an appreciation of local influences such as cultural considerationsdthough this last function need not be performed by a local IRB. Efforts to improve the abilities and procedures of IRBs should promote independence of IRB review; identify which functions commonly performed by IRBs may be better situated elsewhere in an institution; promote a culture of awareness of conflict of interest among institutions and investigators and implement strong policies for identifying and managing conflict of interest; minimize duplicative IRB review; conduct research to develop measures of IRB consistency and thoroughness; and identify the costs associated with IRB review and other functions that IRBs are required to take on to ensure adequate funding.

CONCLUSION Research involving human subjects, even if they may benefit directly from participation, is a different kind of enterprise from the routine practice of medicine. The goals of research include not only the welfare of individual subjects but also the gathering of scientific data for future application. Our society has granted a conditional privilege to perform research with human subjects: the research must be scientifically sound and must be conducted in a manner that protects the rights and safeguards the welfare of participants. The IRB system is well developed but is ever evolving. The current US system for protecting human research subjects, including the role of IRBs, is undergoing serious evaluation and change. Successful evolution of the system depends on learning from the past, understanding current and future needs, and applying knowledge to implement meaningful improvements. Researchers, research participants and institutions, and others, particularly the American people who bear the burdens of research and to whom the benefits accrue, all have a stake in the process.

SUMMARY QUESTIONS 1. The mandate of IRBs provided in federal regulations (45 CFR 46) is to a. Conduct primary scientific review of research protocols to ensure that they are scientifically sound b. Protect the rights and safeguard the welfare of human research subjects by conducting independent ethical review c. Ensure that research protocols are consistent with contemporaneous national public policies d. Approve research protocols quickly to support acquisition of research funding 2. Which of the following statements is/are correct? a. Institutional officials may choose not to allow an IRB-approved research protocol to go forward but they are not permitted to override IRB disapproval b. IRBs are required to conduct continuing review of active research protocols at least once per year c. IRBs are expected to identify when potential researchers may be vulnerable to coercion or undue influence and to provide additional appropriate protections d. Vulnerable research subjects include, but are not limited to, children, prisoners, and pregnant women e. All of the above 3. Our society has decided that review of research involving human subjects should undergo review by IRBs because a. Clinical researchers have an inherent conflict of interest when balancing their roles as researchers and health-care professionals b. Review of research by persons who do not have a role in the research is one way to promote ethically sound research c. IRB review is one way research institutions, researchers, IRBs, and others are held publicly accountable for their decisions and actions regarding their clinical research activities d. IRBs are made up of diverse persons with varied backgrounds to promote a comprehensive approach to protecting the rights and safeguarding the welfare of research subjects e. All of the above

References 1. Levine J. The nuremberg code. Ethics and regulation of clinical research. New Haven, CT: Yale University Press; 1988. 2. Katz J. The Nuremberg code and the Nuremberg trial: a reappraisal. JAMA 1988;276:1662e6. 3. McCarthy C. The origins and policies that govern institutional review boards. In: Emanuel E, Grady C, Crouch R, Lie R, Miller F, Wendler D, editors. The oxford textbook of clinical research ethics. New York: Oxford University Press; 2008 [Chapter 50].

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

60

4. INSTITUTIONAL REVIEW BOARDS

4. U.S. Congress. National research act (PL 93-348). July 12, 1974. https://history.nih.gov/research/downloads/PL93-348.pdf. 5. National Commission for the Protection of Human Subjects of Biomedical & Behavioral Research. The Belmont report: ethical principles and guidelines for the protection of human subjects of research. Publication No. 887e809. Washington, DC: Government Printing Office; 1979. p. 4e8. 6. DHHS. Regulations for protection of human subjects, 45 CFR, Part 46. http://www.hhs.gov/ohrp/regulations-and-policy/regulations/ 45-cfr-46/index.html; 1981. 7. Porter J, Koski G. Regulations for the protection of humans in research in the United States. In: Emanuel E, Grady C, Crouch R, Lie R, Miller F, Wendler D, editors. The oxford textbook of clinical research ethics. New York: Oxford University Press; 2008. p. 156e67 [Chapter 15]. 8. Parts 50 FDA regulations for the protection of human subjects, 21 CFR. 1996. p. 56. 9. Food and Drug Administration. Guidance for institutional review boards and clinical researchers. Information sheet on “Significant Differences in FDA and HHS Regulations”. 2000. Retrieved from: http://www.fda.gov/ScienceResearch/SpecialTopics/ RunningClinicalTrials/EducationalMaterials/ucm112910.htm. 10. Borror K, Carome M, McNeilly P, et al. A review of OHRP compliance oversight letters. IRB: Ethics Hum Res SeptembereOctober 2003;25(5):1e4. 11. Weil C, Rooney L, McNeilly P, Cooper K, Borror K, Andreason P. OHRP compliance oversight letters: an update. IRB: Ethics Hum Res 2010;32:1e6. 12. Office of Human Research Protections. http://www.hhs.gov/ ohrp/compliance-and-reporting/index.html. 12a. HHS. https://www.gpo.gov/fdsys/pkg/FR-2017-01-19/html/ 2017-01058.htm. 13. Wichman A, Mills D, Sandier AL. Exempt research: procedures in the intramural research program of the National Institutes of Health. Rev Hum Subjects Res MarcheApril 1996:3e5. 14. Faden RR, Beauchamp TL. A history and theory of informed consent. New York: University Press; 1986. 15. Beecher HK. Ethics and clinical research. N Engl J Med 1996;274: 1354e66. 16. Pogorzelska M, Stone PW, Gross Cohn E, et al. Changes in the institutional review board submission process for multicenter research over 6 years. Nurs Outlook 2010;58:181e7. 17. Levine J. The nuremberg code. Ethics and regulation of clinical research. New Haven, CT: Yale University Press; 1988. p. 427e9. 18. Emanuel E, Wendler D, Grady C. An ethical framework for biomedical research. In: Emanuel E, Grady C, Crouch R, Lie R, Miller F, Wendler D, editors. The oxford textbook of clinical research ethics. New York: Oxford University Press; 2008. p. 123e40. 19. Taylor HA, Chaisson L, Sugarman J. Enhancing communication among data monitoring committees and institutional review boards. Clin Trials 2008;5:277. 20. NIH policy and guidelines on the inclusion of women and minorities as subjects in clinical research e amended. October 2001. Retrieved from: http://grants.nih.gov/grants/funding/women_min/ guidelines_amended_10_2001.htm. 21. Dresser R. Wanted: single, white male for medical research. Hastings Cent Rep 1992;1(22):24e9. 22. Levine J. The nuremberg code. Ethics and regulation of clinical research. New Haven, CT: Yale University Press; 1988. p. 96. 23. Levine J. The nuremberg code. Ethics and regulation of clinical research. New Haven, CT: Yale University Press; 1988. p. 72. 24. Institute of Medicine. Ethical considerations for research involving prisoners. Washington, DC: IOM; 2006. 25. Institute of Medicine. Ethical conduct of clinical research involving children. Washington, DC: IOM; 2004.

26. American Academy of Neurology. Position statement. Ethical issues in clinical research in neurology. Neurology 1998;50:592e5. 27. National Bioethics Advisory Commission. Research involving persons with mental disorders that may affect decision making capacity. Washington, DC: Government Printing Office; 1998. 28. Alzheimer’s Association. Research consent for cognitively impaired adults: recommendations for institutional review boards and investigators. Alzheimer Dis Assoc Disord 2004;18:171e5. 29. Food and Drug Administration. Protection of human subjects: exception from informed consent requirements for emergency research; Revised 2010. 21 CFR, Part 50.24. 30. Standards for privacy of individually identifiable health information (2000). 45 CFR 46, Parts 160, 165 Fed Reg 2000;65: 82462e82829:(See also HHS Office for Civil Rights, Retrieved from: http://www.hhs.gov/ocr/hipaa. For effect of the Privacy Rule on clinical research, see http://privacyruleandresearch.nih. gov/clin_research.asp.). 31. U.S. Government Accountability Office, Health, Education, and Human Services division. Report to the ranking minority member, Senate Commission on governmental affairs. 1996. Continued Vigilance, critical to protecting human subjects. Publication No. 96e72. 32. Danis M, Largent E, Wendler D, et al. Research ethics consultation: a casebook. New York: Oxford University Press; 2012. 33. Bekelman JE, Li Y, Gross CP. Scope and impact of financial conflicts of interests in biomedical research. JAMA 2003;289:454e65. 34. Emanuel E, Menikoff J. Reforming the regulations governing research with human subjects. NEJM 2011;365(12):1145e50. 35. Lo B, Barnes M. Federal research regulations for the 21st century. NEJM 2016;374:1205e7. 36. Savulescu J, Spriggs M. The hexamethonium asthma study and the death of a normal volunteer in research. J Med Ethics 2002;28:3e4. 37. Resnik D, Ariansen J, et al. Institutional conflict of interest policies at U.S. Academic Research Institutions. Acad Med 2016;91(2):242e6. 38. Institute of Medicine. Responsible research: a systems approach to protecting research participants. In: Federman D, Hanna K, Rodriguez L, editors. National Academies Press; 2002. Retrieved from: http://www.nap.edu/search/?term¼ResponsibleþResearch% 3AþAþSystemsþApproachþtoþProtectingþResearchþParticipants. 39. Emanuel E, Wood A, Fleischman A, et al. Oversight of human participants research: identifying problems to evaluate reform proposals. Ann Intern Med 2004;141:282e91. 40. Department of Health, Human Services. Human subjects research protections: enhancing protections for research subjects and reducing burden, delay, and ambiguity for investigators. Fed Regist 2011;76(143):44512e31. Retrieved from: http://www.gpo.gov/ fdsys/pkg/FR-2011-07-26/html/2011-18792.htm. 41. Department of Health and Human Services. Federal policy for the protection of human subjects. Fed Regist 2015;80(173):53933e4061. 42. Katz J, Capron A, Glass E. Experimentation with human beings: the authority of the investigator, subject, professions, and state in the human experimentation process. New York: Russell Sage Foundation; 1972. p. 9e44. 43. Brandt A. Racism and research: the case of the Tuskegee syphilis study. Hastings Cent Rep 1978;8(6):21e9. 44. Menikoff J, Kaneshiro J, Prtichard I. The Common Rule updated. NEJM January 19, 2017;376(3). 45. Green LA, Lowery JC, Kowalski CP, et al. Impact of institutional review board practice variation on observational health services research. Health Serv Res 2006:214e30. 46. Beardsmore CS, Westaway JA. The shifting sands of research ethics and governance: effect of research in paediatrics. Arch Child Dis 2007;92(1):80e1. 47. Greene SM, Geiger AM. A Review finds that multicenter studies face substantial challenges but strategies exist to achieve Institutional Review Board approval. J Clin Epidemiol 2006;59:784e90.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REFERENCES

48. McWilliams R, Hoover-Fond J, Hamosh A, et al. Problematic variation in local institutional review of a multicenter genetic epidemiology study. JAMA 2003;290:360e6. 49. National Institutes of Health. Final NIH policy on the use of a single “Institutional Review Board for Multisite Research”. 81 FR 40325. 2016. p. 40325e4033. Retrieved from: https://www.federalregister.gov/ articles/2016/06/21/2016-14513/final-nih-policy-on-the-use-ofa-singleinstitutional-review-board-for-multi-site-research. 50. Loh ED, Meyer RE. Medical schools’ attitudes and perceptions regarding the use of central institutional review boards. Acad Med 2004;79:644e51. 51. Klitzman R. How local IRBs view central IRBs in the US. BMC Med Ethics 2011;12:13. 52. Christian MC, Goldberg JL, Killen J, et al. A central institutional review board for multi-institutional trials. N Engl J Med 2002;346: 1405e8. 53. Wagner TH, Murray C, Goldberg J, et al. Cost and benefits of the national cancer Institute central institutional review board. J Clin Oncol 2007;28(4):662e6. 54. Kaufmann P, O’Rourke PP. Central institutional review board review for an academic trial network. Acad Med March 2015; 90(3):321e3. 55. Slutsman J, Hirschfeld S. A federated model of IRB review for multisite studies: a report on the national children’s study federated IRB initiaitve. IRB: Ethics Hum Res 2014;36(6):1e6.

61

56. Edgar H, Rothman D. The institutional review board and beyond: future challenges to the ethics of human experimentation. Millbank Q 1995;73:489e506. 57. Annas G. Research censorship on campus. Review of the censor’s hand: the misregulation of human-subject research, by Carl. E. Schneider. The new rambler review: an online review of books. Retrieved at: http://newramblerreview.com/component/content/article? id¼102:research-censorship-on-campus; 2015. 58. Chadwick GL, Dunn CM. Institutional review boards: changing with the times? J Public Health Manag Pract 2000;6:19e27. 59. Grady C. Do IRBs protect human research participants? JAMA 2010;304:1122e3. 60. Beh HG. The role of institutional review boards in protecting human subjects: are we really ready to fix a broken system? L Psychol Rev 2002;26:1e47. 61. Abbott L, Grady C. A systematic review of the empirical literature evaluating IRBs: what we know and what we still need to learn. J Empirical Res Hum Res Ethics 2011;6:3e20. 62. Candilis PJ, Lidz CW, Arnold RM. The need to understand IRB deliberations. IRB: Ethics Hum Res 2006;28:1e5. 63. Association for the Accreditation of Human Research Protection Programs (AAHRPP). News release: latest AAHRPP accreditations include NY state department of health and India’s national comprehensive cancer center. June 12, 2014. Retrieved from: http://www.aahrpp. org/learn/news-releases.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

5 Accreditation of Human Research Protection Programs Elyse I. Summers, Michelle Feige Association for the Accreditation of Human Research Protection Programs, Inc., Washington, DC, United States

O U T L I N E A Brief History

63

Principles of Accreditation What AAHRPP Expects From Organizations What Organizations Can Expect From AAHRPP

64 64 64

Domain II: Institutional Review Board or Ethics Committee Domain III: Researcher and Research Staff

Human Research Protection Programs: The Shift to Shared Responsibility 65 The Accreditation Standards Domain I: Organization

66 66

Steps to Accreditation

70

Value of Accreditation

70

Summary Questions

72

References

72

A BRIEF HISTORY

A global nonprofit organization, the Association for the Accreditation of Human Research Protection Programs, Inc. (AAHRPP), was founded in 2001 to accredit high-quality human research protection programs (HRPPs) in every sector of the human research enterprise. Accreditation-eligible organizations include academic institutions, contract research organizations, government agencies, hospitals, independent institutional review boards (IRBs), private entities, research institutes, and dedicated research sites. AAHRPP accreditation demonstratesdto the public, research participants, other research organizations, and government and industry sponsorsdthat an organization meets rigorous standards for protecting research participants and is committed to high-quality, ethical research. The AAHRPP-accreditation model is voluntary, peer-driven, and educationally based and has the complementary outcome of raising research participant protection to an organizational priority.

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00005-8

67 69

For the US research community, the late 1990s and early 2000s will long be remembered for a number of high-profile research protection deficiencies, followed by the temporary government shutdown of several prestigious research programs. One of the most serious failures resulted in the death on September 17, 1999 of Jesse Gelsinger, a student enrolled in a gene-transfer trial at the University of Pennsylvania. The Gelsinger case shined a spotlight on issues including deficiencies in the informed consent process, investigator conflict of interest, and the reporting of adverse events. The case also prompted congressional hearings on the safety of US clinical trials and contributed to calls for fundamental improvements to safeguard participants and restore public confidence in research. Two entities, the nonprofit Institute of Medicine (IOM)1 (now the National Academy of Medicine) and the National

63

Copyright © 2018. Published by Elsevier Inc.

64

5. ACCREDITATION OF HUMAN RESEARCH PROTECTION PROGRAMS

Bioethics Advisory Commission,2 issued reports acknowledging that accreditation offered promise as part of a multipronged solution. Seven highly respected organizationsdthe Association of American Universities (AAU), Association of American Medical Colleges, Association of Public and Land-grant Universities, Consortium of Social Science Associations, Federation of American Societies for Experimental Biology, National Health Council, and Public Responsibility in Medicine and Researchdled the charge to establish AAHRPP. These “Founding Members” incorporated AAHRPP in April 2001, the same month the IOM issued its report, Preserving Public Trust: Accreditation and Human Research Participant Protection Programs. Six months later, AAHRPP began developing its accreditation standards, which were released in February 2002. The first accreditations followed 14 months later. As of September 2017, 247 organizations were AAHRPP accredited, including 46 that are located outside the United States. AAHRPP has accredited organizations in 47 US states and in Belgium, Brazil, Canada, China, India, Mexico, Republic of Korea, Saudi Arabia, Singapore, South Africa, Taiwan, and Thailand. All major independent US IRBs have earned AAHRPP accreditation. In addition, 70% of US medical colleges and 84% of the top National Institutes of Health (NIH)-funded academic medical centers are either AAHRPP accredited or have begun the accreditation process. The intramural research program of the US NIH, the world’s largest public funder of research, has earned accreditation, as has Pfizer, Inc., the largest industry sponsor of clinical research. Although AAHRPP accreditation is substantively grounded in the US federal regulations, the AAHRPP accreditation standards and process also reflect the local requirements of any non-US jurisdiction where accreditation is sought. Perhaps most important, AAHRPP accreditation has proved to promote highquality research and to help organizations worldwide strengthen their HRPPs.

PRINCIPLES OF ACCREDITATION AAHRPP has adopted nine principles that serve as the foundation for accreditation and the AAHRPP accreditation standards.3 Together, these principles also set forth what AAHRPP expects from organizations and what organizations can expect from AAHRPP.

What AAHRPP Expects From Organizations 1. Protecting the rights and welfare of research participants must be an organization’s first priority. An

organization should promote a research environment where ethical, productive investigation is valued. 2. Protecting research participants is the responsibility of everyone within an organization and is not limited to the IRB or ethics committee (EC). Accreditation examines whether the policies and procedures of the organization as a whole result in a coherent, effective system to protect research participants and whether every role-player understands his or her responsibilities. 3. Striving to meet or exceed the federal requirements and to continually seek new safeguards for protecting research participants while advancing scientific progress must be integrated into an organization’s mission. Organizations can rely on AAHRPP standards to fill a void in instances where there is little or no regulatory guidance on research protections. Generally, where there is clear regulatory language and guidance that spells out a government agency’s view, AAHRPP standards should not be interpreted to require greater than what the regulations require.

What Organizations Can Expect From AAHRPP 4. The standards for protecting participants in human research will be clear, specific, and applicable to research across the full range of settings (e.g., university-based biomedical, behavioral and social science research; IRBs; hospitals; government agencies; and others). Standards will address any special concerns (e.g., the use of vulnerable populations or heightened risk to privacy and confidentiality) that may arise in each setting. 5. The standards will identify outcome measures that organizations can use to assess and demonstrate quality improvement over time. 6. The standards will be performance based, using objective criteria and measurable outcomes to evaluate whether an HRPP effectively implements the standards. The application for accreditation will be reviewed by the Council on Accreditation, which will determine whether standards are met and, where appropriate, will include commendations for “areas of distinction” for organizations that demonstrate exemplary or innovative practices. When organizations struggle to meet specific standards, the Council will recommend various methods of satisfying the unmet requirements. 7. The accreditation process will provide a clear, understandable pathway to help organizations achieve accreditation, and AAHRPP staff will be available with assistance throughout the process.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

HUMAN RESEARCH PROTECTION PROGRAMS

8. The accreditation process will be collegial, educational, and interactive and will include discussion with and constructive feedback from AAHRPP staff. The accreditation process will identify areas in which the HRPP does not yet meet the standards and give organizations the opportunity to discuss potential program improvements. 9. The accreditation process will be responsive to changes in federal regulations and to standards that will evolve based on the changing landscape of the research enterprise.

HUMAN RESEARCH PROTECTION PROGRAMS: THE SHIFT TO SHARED RESPONSIBILITY For decades, research oversight was considered the purview of IRBs and ECs. Calls for broader, organizational responsibility came after the same series of US research incidents that helped give rise to AAHRPP accreditation. In 2000, the year before AAHRPP’s founding, AAU issued a report4 recommending increased vigilance by senior university management and training for all personnel involved in research with human participants. In essence, AAU was making a case for broader responsibility for research protections beyond the IRB. The IOM took an even stronger position. In its April 17, 2001 report, Preserving Public Trust: Accreditation and Human Research Participant Programs, IOM argued for a more comprehensive approach toward human research protections. Specifically, the report called for “a broader human research participant protection system than just the IRB, with multiple functional elements that in total are referred to as human research participant protection programs, or HRPPPs.” Under this approach, as shown in Fig. 5.1, responsibilities

FIGURE 5.1

Human research protection program.

65

are shared among numerous players, including senior organization officials, conflict of interest committees, education programs, auditing and compliance oversight functions, offices of sponsored programs, pharmacy services, researchers and research staff, and the IRB or EC. As its name indicates, from the beginning AAHRPP recognized and advocated for a systematic, organization-wide approach to research protections. Although AAHRPP does not mandate that organizations label their protection programs as HRPPs, the accreditation standards do require a comprehensive, integrated program that affords protections for all research participants. Furthermore, the AAHRPP accreditation standards acknowledge the responsibilities of all those involved in research, including IRB members and IRB staff, researchers and research staff, sponsors, and others across the research enterprise. The AAHRPP requirement for shared responsibility and an integrated HRPP applies even for an organization that outsources all research review to external IRBs. Under these circumstances, the research organization still has an obligation to protect research participants and ensure the integrity of the research. To help organizations understand and meet the accreditation standards, in 2009 the AAHRPP Board of Directors identified the following characteristics of high-quality HRPPs. • The commitment to human research permeates the entire organization, starting with senior leadership who set the example by promoting ethical and productive human research. • Researchers, IRB professionals, and others involved in protecting research participants communicate and collaborate to achieve a shared vision that emphasizes the importance of ethical human research. • The organization sets as a high priority the protection of human research participants and safeguarding their well-being. • The organization advances discovery by publishing or sharing new knowledge. • The consent process is ethical, comprehensive, and informative. • The organization manages conflicts of interest to preserve the integrity of human research. • The organization recognizes and fulfills its responsibility to the public by promoting community outreach and education efforts that help build public trust and support for human research. • Researchers and IRBs view the HRPP as efficient and effective.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

66

5. ACCREDITATION OF HUMAN RESEARCH PROTECTION PROGRAMS

THE ACCREDITATION STANDARDS AAHRPP accreditation standards are divided into three domains: Organization, IRB or EC, and Researcher and Research Staff. These domains represent the three primary spheres of responsibility within an HRPP. Each domain contains practice-based standards; elements further specify what is required to meet a standard.

Domain I: Organization Domain I focuses on the overarching organization, including policies for financial disclosures, clinical trials, education and training in research ethics, scientific review, community engagement, and quality improvement. Standard I-1: The organization has a systematic and comprehensive HRPP that affords protections for all research participants. Individuals within the organization are knowledgeable about and follow the policies and procedures of the HRPP. Element I.1.A. The organization has and follows written policies and procedures for determining when activities are overseen by the HRPP. Element I.1.B. The organization delegates responsibility for the HRPP to an official with sufficient standing, authority, and independence to ensure implementation and maintenance of the program. Element I.1.C. The organization has and follows written policies and procedures that allow the IRB or EC to function independently of other organizational entities in protecting research participants. Element I.1.D. The organization has and follows written policies and procedures setting forth the ethical standards and practices of the HRPP. Relevant policies and procedures are made available to sponsors, researchers, research staff, research participants, and the IRB or EC, as appropriate. Element I.1.E. The organization has an education program that contributes to the improvement of the qualifications and expertise of individuals responsible for protecting the rights and welfare of research participants. Element I.1.F. The organization has and follows written policies and procedures for reviewing the scientific or scholarly validity of a proposed research study. Such procedures are coordinated with the ethics review process. Element I.1.G. The organization has and follows written policies and procedures that identify applicable laws in the localities where it conducts human research, takes them into account in the

review and conduct of research, and resolves differences between federal or national law and local laws. Standard I-2: The organization ensures that the HRPP has resources sufficient to protect the rights and welfare of research participants for the research activities that the organization conducts or oversees. Standard I-3: The organization’s transnational research activities are consistent with the ethical principles set forth in its HRPP and meet equivalent levels of participant protection as research conducted in the organization’s principal location while complying with local laws and taking into account cultural context. Standard I-4: The organization responds to the concerns of research participants. Element I.4.A. The organization has and follows written policies and procedures that establish a safe, confidential, and reliable channel for current, prospective, or past research participants or their designated representatives that permit them to discuss problems, concerns, and questions; obtain information; or offer input with an informed individual who is unaffiliated with the specific research protocol or plan. Element I.4.B. The organization conducts activities designed to enhance understanding of human research by participants, prospective participants, or their communities, when appropriate. These activities are evaluated on a regular basis for improvement. Element I.4.C. The organization promotes the involvement of community members, when appropriate, in the design and implementation of research and the dissemination of results. Standard I-5: The organization measures and improves, when necessary, compliance with organizational policies and procedures and applicable laws, regulations, codes, and guidance. The organization also measures and improves, when necessary, the quality, effectiveness, and efficiency of the HRPP. Element I.5.A. The organization conducts audits or surveys or uses other methods to assess compliance with organizational policies and procedures and applicable laws, regulations, codes, and guidance. The organization makes improvements to increase compliance, when necessary. Element I.5.B. The organization conducts audits or surveys or uses other methods to assess the quality, efficiency, and effectiveness of the HRPP. The organization identifies strengths and weaknesses of the HRPP and makes improvements, when necessary, to increase the quality, efficiency, and effectiveness of the program.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

THE ACCREDITATION STANDARDS

Element I.5.C. The organization has and follows written policies and procedures so that researchers and research staff may bring forward to the organization concerns or suggestions regarding the HRPP, including the ethics review process. Element I.5.D. The organization has and follows written policies and procedures for addressing allegations and findings of noncompliance with HRPP requirements. The organization works with the IRB or EC, when appropriate, to ensure that participants are protected when noncompliance occurs. Such policies and procedures include reporting these actions, when appropriate. Standard I-6: The organization has and follows written policies and procedures to ensure that research is conducted so that financial conflicts of interest are identified, managed, and minimized or eliminated. Element I.6.A. The organization has and follows written policies and procedures to identify, manage, and minimize or eliminate financial conflicts of interest of the organization that could influence the conduct of the research or the integrity of the HRPP. Element I.6.B. The organization has and follows written policies and procedures to identify, manage, and minimize or eliminate individual financial conflicts of interest of researchers and research staff that could influence the conduct of the research or the integrity of the HRPP. The organization works with the IRB or EC in ensuring that financial conflicts of interest are managed and minimized or eliminated, when appropriate. Standard I-7: The organization has and follows written policies and procedures to ensure that the use of any investigational or unlicensed test article complies with all applicable legal and regulatory requirements. Element I.7.A. When research involves investigational or unlicensed test articles, the organization confirms that the test articles have appropriate regulatory approval or meet exemptions for such approval. Element I.7.B. The organization has and follows written policies and procedures to ensure that the handling of investigational or unlicensed test articles conforms to legal and regulatory requirements. Element I.7.C. The organization has and follows written policies and procedures for compliance with legal and regulatory requirements governing emergency use of an investigational or unlicensed test article. Standard I-8: The organization works with public, industry, and private sponsors to apply the requirements of the HRPP to all participants.

67

Element I.8.A. The organization has a written agreement with the sponsor that addresses medical care for research participants with a research-related injury, when appropriate. Element I.8.B. In studies where sponsors conduct research site monitoring visits or conduct monitoring activities remotely, the organization has a written agreement with the sponsor that the sponsor promptly reports to the organization findings that could affect the safety of participants or influence the conduct of the study. Element I.8.C. When the sponsor has the responsibility to conduct data and safety monitoring, the organization has a written agreement with the sponsor that addresses provisions for monitoring the data to ensure the safety of participants and for providing data and safety monitoring reports to the organization. Element I.8.D. Before initiating research, the organization has a written agreement with the sponsor about plans for disseminating findings from the research and the roles that researchers and sponsors will play in the publication or disclosure of results. Element I.8.E. When participant safety could be directly affected by study results after the study has ended, the organization has a written agreement with the sponsor that the researcher or organization will be notified of the results to consider informing participants. Standard I-9: The organization has written policies and procedures to ensure that, when sharing oversight of research with another organization, the rights and welfare of research participants are protected.

Domain II: Institutional Review Board or Ethics Committee Domain II covers the IRB or EC, including its composition, review practices, documentation, and policies, such as those for handling unanticipated problems and protecting vulnerable participants. Standard II-1: The structure and composition of the IRB or EC are appropriate to the amount and nature of the research reviewed and in accordance with requirements of applicable laws, regulations, codes, and guidance. Element II.1.A. The IRB or EC membership permits appropriate representation at the meeting for the types of research under review, and this is reflected on the IRB or EC roster. The IRB or EC has one or more unaffiliated members; one or more members

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

68

5. ACCREDITATION OF HUMAN RESEARCH PROTECTION PROGRAMS

who represent the general perspective of participants; one or more members who do not have scientific expertise; one or more members who have scientific or scholarly expertise; and, when the IRB or EC regularly reviews research that involves vulnerable participants, one or more members who are knowledgeable about or experienced in working with such participants. Element II.1.B. The IRB or EC has qualified leadership (e.g., chair and vice chair) and qualified members and staff. Membership and composition of the IRB or EC are periodically reviewed and adjusted as appropriate. Element II.1.C. The organization has and follows written policies and procedures to separate competing business interests from ethics review functions. Element II.1.D. The IRB or EC has and follows written policies and procedures so that members and consultants do not participate in the review of research protocols or plans in which they have a conflict of interest, except to provide information requested by the IRB or EC. Element II.1.E. The IRB or EC has and follows written policies and procedures requiring research protocols or plans to be reviewed by individuals with appropriate scientific or scholarly expertise and other expertise or knowledge as required to review the research protocol or plan. Standard II-2: The IRB or EC evaluates each research protocol or plan to ensure the protection of participants. Element II.2.A. The IRB or EC has and follows written policies and procedures for determining when activities are exempt from applicable laws and regulations, when permitted by law or regulation and exercised by the IRB or EC. Such policies and procedures indicate that exemption determinations are not to be made by researchers or others who might have a conflict of interest regarding the studies. Element II.2.B. The IRB or EC has and follows written policies and procedures for addressing protection of participants in research that is exempt from applicable laws and regulations. These functions may be delegated to an entity other than the IRB or EC. Element II.2.C. The IRB or EC has and follows written policies and procedures for conducting meetings by the convened IRB or EC. Element II.2.D. The IRB or EC has and follows written policies and procedures to conduct reviews by the convened IRB or EC. Element II.2.D.1dInitial review. Element II.2.D.2dContinuing review.

Element II.2.D.3dReview of proposed modifications to previously approved research. Element II.2.E. The IRB or EC has and follows written policies and procedures to conduct reviews by an expedited procedure, if such procedure is used. Element II.2.E.1dInitial review. Element II.2.E.2dContinuing review. Element II.2.E.3dReview of proposed modifications to previously approved research. Element II.2.F. The IRB or EC has and follows written policies and procedures for addressing unanticipated problems involving risks to participants or others, and for reporting these actions, when appropriate. Element II.2.G. The IRB or EC has and follows written policies and procedures for suspending or terminating IRB or EC approval of research, if warranted, and for reporting these actions, when appropriate. Element II.2.H. The IRB or EC has and follows policies and procedures for managing multisite research by defining the responsibilities of participating sites that are relevant to the protection of research participants, such as reporting of unanticipated problems or interim results. Standard II-3: The IRB or EC approves each research protocol or plan according to criteria based on applicable laws, regulations, codes, and guidance. Element II.3.A. The IRB or EC has and follows written policies and procedures for identifying and analyzing risks and identifying measures to minimize such risks. The analysis of risk includes a determination that the risks to participants are reasonable in relation to the potential benefits to participants and society. Element II.3.B. The IRB or EC has and follows written policies and procedures for reviewing the plan for data and safety monitoring, when applicable, and determines that the data and safety monitoring plan provides adequate protection for participants. Element II.3.C. The IRB or EC has and follows written policies and procedures to evaluate the equitable selection of participants. Element II.3.C.1. The IRB or EC has and follows written policies and procedures to review proposed participant recruitment methods, advertising materials, and payment arrangements and determines whether such arrangements are fair, accurate, and appropriate. Element II.3.D. The IRB or EC has and follows written policies and procedures to evaluate the proposed arrangements for protecting the privacy interests of research participants, when appropriate, during their involvement in the research. Element II.3.E. The IRB or EC has and follows written policies and procedures to evaluate proposed

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

THE ACCREDITATION STANDARDS

arrangements for maintaining the confidentiality of identifiable data, when appropriate, preliminary to the research, during the research, and after the conclusion of the research. Element II.3.F. The IRB or EC has and follows written policies and procedures to evaluate the consent process and to require that the researcher appropriately document the consent process. Element II.3.G. The IRB or EC has and follows written policies and procedures for approving waivers or alterations of the consent process and waivers of consent documentation. Standard II-4: The IRB or EC provides additional protections for individuals who are vulnerable to coercion or undue influence and participate in research. Element II.4.A. The IRB or EC has and follows written policies and procedures for determining the risks to prospective participants who are vulnerable to coercion or undue influence and ensuring that additional protections are provided as required by applicable laws, regulations, codes, and guidance. Element II.4.B. The IRB or EC has and follows written policies and procedures requiring appropriate protections for prospective participants who cannot give consent or whose decision-making capacity is in question. Element II.4.C. The IRB or EC has and follows written policies and procedures for making exceptions to consent requirements for planned emergency research and reviews such exceptions according to applicable laws, regulations, codes, and guidance. Standard II-5: The IRB or EC maintains documentation of its activities. Element II.5.A. The IRB or EC maintains a complete set of materials relevant to the review of the research protocol or plan for a period of time sufficient to comply with legal and regulatory requirements, sponsor requirements, and organizational policies and procedures. Element II.5.B. The IRB or EC documents discussions and decisions on research studies and activities in accordance with legal and regulatory requirements, sponsor requirements, if any, and organizational policies and procedures.

Domain III: Researcher and Research Staff Domain III applies to the researchers and research staff, including their knowledge of and adherence to ethical standards, and government regulations, reporting requirements, the protocol, and organizational

69

policies. Domain III standards also focus on how well the researcher and research staff oversee the research and whether they are responsive to the questions and concerns of research participants. Standard III-1: In addition to following applicable laws and regulations, researchers and research staff adhere to ethical principles and standards appropriate for their discipline. In designing and conducting research studies, researchers and research staff have the protection of the rights and welfare of research participants as a primary concern. Element III.1.A. Researchers and research staff know which of the activities they conduct are overseen by the HRPP, and they seek guidance when appropriate. Element III.1.B. Researchers and research staff identify and disclose financial interests according to organizational policies and regulatory requirements and, with the organization, manage, minimize, or eliminate financial conflicts of interest. Element III.1.C. Researchers employ sound study design in accordance with the standards of their discipline. Researchers design studies in a manner that minimizes risks to participants. Element III.1.D. Researchers determine that the resources necessary to protect participants are present before conducting each research study. Element III.1.E. Researchers and research staff recruit participants in a fair and equitable manner. Element III.1.F. Researchers employ consent processes and methods of documentation appropriate to the type of research and the study population, emphasizing the importance of comprehension and voluntary participation to foster informed decision-making by participants. Element III.1.G. Researchers and research staff have a process to address participants’ concerns, complaints, or requests for information. Standard III-2: Researchers and research staff meet requirements for conducting research with participants and comply with all applicable laws, regulations, codes, and guidance; the organization’s policies and procedures for protecting research participants; and the IRB’s or EC’s determinations. Element III.2.A. Researchers and research staff are qualified by training and experience for their research roles, including knowledge of applicable laws, regulations, codes, and guidance; relevant professional standards; and the organization’s policies and procedures regarding the protection of research participants. Element III.2.B. Researchers maintain appropriate oversight of each research study as well as research

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

70

5. ACCREDITATION OF HUMAN RESEARCH PROTECTION PROGRAMS

staff and trainees, and appropriately delegate research responsibilities and functions. Element III.2.C. Researchers and research staff follow the requirements of the research protocol or plan and adhere to the policies and procedures of the organization and to the requirements or determinations of the IRB or EC. Element III.2.D. Researchers and research staff follow reporting requirements in accordance with applicable laws, regulations, codes, and guidance; the organization’s policies and procedures; and the IRB’s or EC’s requirements. To earn full accreditation, an organization must meet all applicable standards. For example, if an organization does not conduct research involving test articles and therefore is not regulated by the US Food and Drug Administration, Standard I.7 is not applicable.

STEPS TO ACCREDITATION The AAHRPP accreditation process is similar to those used by other accrediting bodies that evaluate specific competencies of education, animal care and research, and health-care organizations. The major steps include a self-assessment, an on-site evaluation, and review by the AAHRPP Council on Accreditation. The first step for those seeking accreditation is to conduct an in-depth self-assessment, or gap analysis, during which the applying organization compares its HRPP to the AAHRPP accreditation standards. The organization uses the AAHRPP Evaluation Instrument for Accreditation to perform an element-by-element assessment of HRPP policies, procedures, and practices and make revisions or improvements, as needed. This self-assessment typically is the most time-consuming phase of the process and can take several months or longer depending on the status of the HRPP and the resources available to devote to accreditation. For many organizations, however, the self-assessment also is the most valuable accreditation-related activity because it often results in the most comprehensive evaluation ever conducted of the organization’s entire HRPP. It is not unusual for organizations to identify unexpected areas of strength along with those in need of improvement. After completing the self-assessment, the organization submits what is known as the Step 1 Application. This document, which includes a program overview and the organization’s written policies and procedures, is reviewed by a trained, experienced peer from an AAHRPP-accredited organization or by a member of the AAHRPP staff. The Step 1 reviewer reads the application and works directly with the organization to make

the changes necessary to satisfy the accreditation standards. When revisions are complete, the organization submits a revised application, referred to as the Step 2 Application. Once the Step 2 Application is received, AAHRPP schedules a site visit by a team of trained peers from AAHRPP-accredited organizations. Teams are composed of two to six site visitors depending on the size and complexity of the HRPP. Teams generally include an IRB or compliance professional and a researcher. Team members are chosen, in part, for their experience with the research setting involved (i.e., university, hospital, IRB, etc.). The team reviews the application and conducts an on-site evaluation of the organization’s HRPP, assessing the program’s performance with respect to the accreditation standards as well as in practice. Following the site visit, the team provides the organization with a draft report identifying any discrepancies between the organization’s policies and practices and noting other areas of concern. The organization then has the opportunity to responddto point out any errors of fact and describe corrective actions that have been taken since the site visit. This response, the application, and draft site visit report are submitted to the Council on Accreditation. The Council meets quarterly to determine the accreditation status of applying organizations. In general, organizations are awarded one of four designations: full accreditation, qualified accreditation, accreditation pending, or accreditation withheld. Organizations that achieve accreditation have their names listed on the AAHRPP website and earn the right to display the AAHRPP seal. To maintain accreditation, organizations must be reevaluated 3 years following their initial accreditation, and every 5 years thereafter.

VALUE OF ACCREDITATION AAHRPP was founded on the belief that it would play a central role in strengthening protections for research participants and helping organizations improve regulatory compliance. Fifteen years later, AAHRPP accreditation has taken hold in the United States and made significant inroads overseas, and the results have echoed throughout the research enterprise. AAHRPP’s emphasis on a comprehensive, systematic approach to research protections has played a key role in the fundamental shift to organization-wide responsibility for research ethics and oversight. In addition, increasing acceptance of AAHRPP standards as the world’s standards is facilitating collaboration and laying

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

VALUE OF ACCREDITATION

the foundation for a global infrastructure built on a shared commitment to ethical practices. AAHRPP-accredited organizations value their accreditation and see the benefits to their HRPPs. In a 2015 survey of all AAHRPP-accredited organizations, a vast majority of respondents reported that achieving AAHRPP-accreditation was “very important” to their organization (Fig. 5.2). In addition, respondents were almost unanimous (91%) in their conviction that their overall HRPP has improved as a result of having achieved AAHRPP accreditation (Fig. 5.3). Since 2009, AAHRPP has been collecting metrics for HRPPs to help accredited and nonaccredited organizations benchmark their performance against others and target areas for improvement. Among the most telling results are those for IRB review times (Fig. 5.4). Although IRB review time has been criticized as a poor measure of quality, review times correlate with investigator and sponsor satisfaction. Among AAHRPP clients, in 2015 average review times from submission to initial review were as follows: 18 days for the convened IRB, 8 days by a reviewer using the expedited procedure, and 9 days for a determination of exempt. Other benefits of AAHRPP accreditation are less quantifiable but equally important to research organizations.

71

Accreditation status is a signal to sponsors and colleagues that an organization has a high-quality HRPP and, therefore, is a preferred partner. Pfizer5 was the first pharmaceutical company to announce that it would endeavor to use only accredited independent IRBs for review of multisite clinical trials and would choose an accredited research site over an unaccredited site, if other selection factors were equal. In addition, most industry sponsors set AAHRPP accreditation as a requirement before engaging independent IRBs to review multisite clinical trials. For the academic research community, accreditation has been helpful in facilitating collaborative relationships. Universities are confident in the standards of practice at other AAHRPP-accredited universities. Moreover, accreditation ensures that similar (or, sometimes, the same), policies and procedures are in place at each collaborating institution, that staff at collaborating institutions share similar levels of competence, and that each “speaks the same language.” In an era of increasing collaboration and reliance, including required use of single IRB review for multisite research (for examples, see NeuroNEXT6 and NIH StrokeNet7), the assurance of quality provided by AAHRPP accreditation has never been more critical.

FIGURE 5.2 Importance of achieving Association for the Accreditation of Human Research Protection Programs (AAHRPP) accreditation.

FIGURE 5.3 Results of Association for the Accreditation of Human Research Protection Programs (AAHRPP) accreditation.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

72

Calendar Days

5. ACCREDITATION OF HUMAN RESEARCH PROTECTION PROGRAMS

45 40 35 30 25 20 15 10 5 0

38

19

18

9

8

Time from Submission to Review by the Convened IRB

Time from Submission to Approval by the Convened IRB

Time from Submission to Review by the Expedited Procedure

Time from Submission to Approval by the Expedited Procedure

Time from Submission to Exempt Determination

FIGURE 5.4 Institutional review board (IRB) review times by type of review.

Today, AAHRPP is fulfilling its mission and promise as a leader in research protections and an advocate for the research volunteers, whose participation makes the entire enterprise possible. AAHRPP accreditation has become the gold standard for HRPPs around the globedand the symbol of quality and an organization’s commitment to safe, ethical research.

3. The benefits of accreditation are: a. Improvement in regulatory compliance b. Increased efficiency of the IRB c. Less unnecessary variation in the oversight system d. All of the above

References SUMMARY QUESTIONS 1. The AAHRPP accreditation standards apply to: a. Responsibilities of IRBs only b. Responsibilities of IRBs, researchers, and organizations c. Responsibilities of IRBs and researchers d. Responsibilities of researchers and organizations 2. The AAHRPP accreditation period for initial accreditation is: a. Three years b. Five years c. Two years d. Ten years

1. Institute of Medicine. Preserving public trust: accreditation and human research participant protection programs. Washington, DC: The National Academies Press; 2001. 2. National Bioethics Advisory Commission. Ethical and policy issues in research involving human participants, vol. I. Bethesda, MD: U.S. Government Printing Office; 2001. 3. AAHRPP accreditation standards. 2009. Retrieved from: https://admin. share.aahrpp.org/Website%20Documents/AAHRPP_Accreditation_ Standards.PDF. 4. Association of American Universities Task Force on Research Accountability. Report on university protections of human beings who are the subjects of research. New York, NY: AAU; 2000. 5. IRB Advisor Pfizer sets standard to require IRB accreditation. June 2009. Retrieved from:, http://www.ahcmedia.com/articles/113017-pfizersets-standard-to-require-irb-accreditation. 6. https://www.neuronext.org/about-us. 7. https://www.nihstrokenet.org.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

6 The Regulation of Drugs and Biological Products by the Food and Drug Administration Molly M. Flannery, Amy E. McKee, Diane M. Maloney, Jonathan P. Jarow U.S. Food and Drug Administration, Silver Spring, MD, United States

O U T L I N E Background

73

Mission and Terminology

74

Drug and Biological Product Life Cycle Discovery/Nonclinical Investigation Clinical Trials Responsibilities and Documentation Sponsors Investigators Clinical Protocol Institutional Review Board Food and Drug Administration Investigator Brochure

76 76 76 79 79 80 80 80 81 81

Investigational New Drug Safety Reports Marketing Approval/Licensure Pre-New Drug Application/Biologics License Application Submission Application Food and Drug Administration Review Postapproval

81 81 82 82

83

Compliance

84

Summary

84

Summary Questions

84

primarily in the Center for Drug Evaluation and Research (CDER) and the Center for Biologics Research and Evaluation (CBER).

The mission of the US Food and Drug Administration (FDA) is to protect and enhance the public health through the regulation of medical products, food, and tobacco and to spur innovation to address unmet medical and public health needs. The commissioner of the FDA is nominated by the President and confirmed by the Senate. The FDA’s organization consists of the Office of the Commissioner and four directorates overseeing the core functions of the agency: Medical Products and Tobacco, Foods and Veterinary Medicine, Global Regulatory Operations and Policy, and Operations. There are seven product review centers with oversight authority over specific types of products. In the United States, FDA-regulated products account for approximately 25% of spending by American consumers each year. This chapter provides an overview of the FDA and the regulation of human drug and biological products,

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00006-X

81

BACKGROUND The quality and safety of medical products have been major importance to the United States since the mid1880s. It was then that the US Congress passed the Drug Importation Act, which for the first time required the inspection and prevention of entry of adulterated medicines from abroad. In 1902 and 1906, two laws were passed that form the foundation of the FDA: the Biologics Control Act and the Food and Drug Act. Since that time, Congress has passed additional legislation enhancing FDA’s ability to protect the public health.

73

Copyright © 2018. Published by Elsevier Inc.

74

6. DRUG AND BIOLOGICS REGULATION

Many of the major laws that provide FDA with the authority for the regulation of drugs and biological products came about in response to significant public health problems and national drug tragedies. For example, in 1901, a diphtheria antitoxin adulterated with tetanus led to several deaths, which prompted the passage of the Biologics Control Act of 1902 (Virus, Serum, Antitoxin Act), designed to ensure the purity, potency, and safety of these and other biological products. In 1906, Upton Sinclair published The Jungle, an indictment of the meat packing industry. At the same time, Dr. Harvey Wiley, the chief chemist of the Bureau of Chemistry in the US Department of Agriculture (USDA), was pointing out that toxic adulterants could be found in foods and medicines. This led Congress to pass the Food and Drug Act, which was signed by President Theodore Roosevelt in 1906. This law prohibited interstate commerce of adulterated foods, drinks, and drugs. By 1933 the FDA, which had started as the Bureau of Chemistry under the USDA, had been established and recommended a complete revision of the Food and Drug Act of 1906. The momentum to pass a new act was accelerated by the elixir sulfanilamide tragedy. A new solvent, ethylene glycol, was used to formulate the elixir sulfanilamide that was put on the market without any testing in 1937. The new formulation led to the deaths of more than 100 people, many of whom were children. This led to the passage of the Federal Food, Drug, and Cosmetic Act (FD&C Act) in 1938. The FD&C Act extended FDA authority from food and drugs to cosmetics and devices. It also required that new drugs be shown to be safe before they could be marketed and authorized inspections of factories engaged in the manufacture of regulated products. In 1960, Dr. Francis Kelsey, an FDA medical officer, recommended that thalidomide not be approved for pregnancy-induced nausea in the United States due to insufficient safety data, despite its availability is much in Europe and Canada. It was subsequently discovered that thalidomide was responsible for severe birth defects in thousands of babies born in Europe and other countries. Dr. Kelsey became a national hero and was awarded the President’s Award for Distinguished Federal Civilian Service for her work; additionally, public support for stronger drug regulations resulted in the passage of the 1962 KefauvereHarris amendments to the FD&C Act, which strengthened the drug approval process. A key change in the statutory requirements was that drug manufacturers now were required to prove the effectiveness of a product before it could be approved for marketing. In 1971, the Public Health Service’s Bureau of Radiological Health was transferred to the FDA. Its mission was to protect the public from unnecessary radiation from electronic products in the home and the healing

arts. In the same year, the National Center for Toxicological Research was established to examine the biological effects of chemicals in the environment. The next year, the Division of Biological Standards, which was responsible for the regulation of biological products, was transferred from the National Institutes of Health (NIH) to the FDA to become the Bureau of Biologics. The FDA as we know it today was taking shape. The Prescription Drug User Fee Act (PDUFA) was passed in 1992 in response to a perceived lag in the approval of new drugs in the United States and allowed the agency to collect fees to support the process for the review of human drugs. In 1997, the Food and Drug Administration Modernization Act (FDAMA) was signed into law. This law reauthorized PDUFA and codified a number of FDA initiatives intended to speed the availability of new drugs for serious and life-threatening diseases. The Food and Drug Administration Amendments Act of 2007 (FDAAA) was signed into law. In addition to reauthorizing several critical FDA programs, FDAAA greatly increased the responsibilities of the FDA as well as provided the FDA with additional requirements, authorities, and resources relating to both pre- and postmarket drug safety. In 2010, the Biologics Price Competition and Innovation Act of 2009 was enacted, and one of its key provisions was an amendment to the Public Health Service Act to create a new regulatory pathway for biosimilar products. A few years later, the Food and Drug Administration Safety and Innovation Act of 2012 (FDASIA) was signed into law. Among other things, FDASIA provided the FDA with the authority to collect user fees for generic drugs and biosimilar products and to enhance the safety of the drug supply chain. FDASIA also gave the FDA a new tool, breakthrough therapy designation, intended to expedite the development and review of innovative new drugs that address certain unmet medical needs.

MISSION AND TERMINOLOGY The scope of the FDA’s mission to protect and enhance the public health is outlined in Table 6.1. The regulation of drug and biological products is based on science, law, and public health impact. The FDA is composed of scientists and experts of many disciplines, including physicians, biologists, chemists, pharmacologists, microbiologists, statisticians, consumer safety officers, and epidemiologists. The FDA is responsible for the review of regulatory submissions (e.g., applications for clinical research, marketing, and labeling), the development and implementation of regulatory policy, research and scientific exchange, product surveillance (e.g., adverse event reporting and product testing), compliance (e.g., inspections and enforcement actions),

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

75

MISSION AND TERMINOLOGY

TABLE 6.1 Food and Drug Administration’s Mission

TABLE 6.3

1. To promote the public health by promptly and efficiently reviewing clinical research and taking appropriate action on the marketing of regulated products in a timely manner.

Part 312

Investigational new drug application

2. With respect to such products, protect the public health by ensuring that foods are safe, wholesome, sanitary, and properly labeled; human and veterinary drugs are safe and effective; there is reasonable assurance of the safety and effectiveness of devices intended for human use; cosmetics are safe and properly labeled; and public health and safety are protected from electronic product radiation.

Part 3

Product jurisdiction

Part 50

Protection of human subjects

Part 56

Institutional review boards

Part 11

Electronic records; electronic signatures

3. Participate through appropriate processes with representatives of other countries to reduce the burden of regulation, harmonize regulatory requirements, and achieve appropriate reciprocal arrangements.

Part 58

Good laboratory practice for nonclinical laboratory studies

Part 314

New drug applications

Part 320

Bioavailability and bioequivalence requirements

Parts 600e680

Biologics

Part 54

Financial disclosure by clinical investigators

Part 25

Environmental impact considerations

Parts 201e202

Labeling and prescription drug advertising

Parts 210e211

Current good manufacturing practices

Parts 800e861

Medical devices and in vitro diagnostics

Parts 1270e1271

Human tissue intended for transplantation and human cells, tissues, and cellular- and tissuebased products

4. As determined to be appropriate by the Secretary, carry out paragraphs (1) through (3) in consultation with experts in science, medicine, and public health, and in cooperation with consumers, users, manufacturers, importers, packers, distributors, and retailers of regulated products. From the FDA modernization act of 1997 (PL105-115).

and outreach (e.g., education). As a science-based institution, the FDA strives to facilitate the development of new safe and effective medical products. The primary laws that govern the drug and biological products are shown in Table 6.2. Some important regulations for drugs, biologics, and medical devices in Title 21, Code of Federal Regulations (CFR), are shown in Table 6.3. These laws and regulations are intended to protect the public health. One of the FDA’s primary functions is to ensure compliance with these laws and regulations. The definitions of some of the terms used in this chapter’s discussion of the TABLE 6.2 Statutory Authorities

Federal food drug and cosmetic act

Drugs

Biologics

X

X

Public health service act

X

Interstate commerce

X

X

Foreign commerce

X

Generic equivalence

X

Orphan drug act

X

X

Prescription drug user fee act

X

X

Prescription drug marketing act

X

X

FDA modernization act of 1997

X

X

FDA amendments act of 2007

X

X

FDA safety and innovation act of 2012

X

X

Principal Regulations for Drug and Biological Products: Title 21, Code for Federal Regulations

FDA’s regulation of drugs and biological products are provided in Table 6.4. Another important role of the FDA is communication. The FDA strives to provide accurate information to health-care professionals and the public on product quality, effectiveness, and safety, including through our oversight of labeling, promotion/advertising, and compliance with good manufacturing practice. The FDA website (www.fda.gov) is an extremely valuable tool to access information. Among the documents available on the website are regulations and guidance documents. Guidance documents represent the agency’s current thinking, interpretation or policy, regarding a particular regulatory issue or product. These documents greatly facilitate the public’s understanding of laws, regulations, and policies applicable to FDA. In general, guidance documents are not binding and are updated as needed to provide accurate and timely information. Another important online resource is [email protected] (http://www.accessdata.fda.gov/scripts/cder/drugsatfda/ index.cfm), which is a searchable catalog of FDA-approved

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

76 TABLE 6.4

6. DRUG AND BIOLOGICS REGULATION

Definitions and Terms

Law

A statute. An act of congress that outlines binding conduct or practice in the community.

Regulation

A rule issued by an agency under a law administered by the agency. A regulation interprets a law and has the force of law.

Code of Federal Regulations (CFR)

The compilation of all effective government regulations published annually by the US printing office. Food and Drug Administration’s (FDA) regulations are found in Title 21 of the CFR.

Guidance

FDA documents prepared for FDA staff, applicants/sponsors, and the public that describe the agency’s interpretation of or policy on a regulatory issue. In general, guidance documents are not legally binding.

Biologic

A virus, therapeutic serum, toxin, antitoxin, vaccine, blood, blood component or allergenic product, or analogous product, or arsphenamine or derivative of arsphenamine applicable to the prevention, treatment, or cure of a disease or condition of human beings. This includes immunoglobulins, cytokines, and a variety of other biotechnology-derived products, e.g., cell and nucleic acid products.

Drug

An article intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease in humans or other animals; an article recognized in the US Pharmacopoeia, the official homeopathic pharmacopoeia, or the official national formulary and their supplements; an article (other than food) intended to affect the structure or any function of the body of humans or other animals.

Device

An instrument, apparatus, implement, machine, contrivance, implant, in vitro reagent, which is intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment, or prevention of disease in humans or other animals; or is intended to affect the structure or any function of the body of humans or other animals; and does not achieve its primary intended purpose through chemical action within or on the body of humans or other animals and is not dependent on being metabolized for the achievement of its primary intended purpose.

Investigational New Drug Application (IND)

A request for authorization from the FDA to administer an investigational drug or biological product to humans.

New Drug Application (NDA)

A request for authorization from the FDA to sell and market a new pharmaceutical.

Biologics License Application (BLA)

A request for authorization from the FDA to introduce, or deliver for introduction, a biological product into interstate commerce.

drug and biological products, including approval letters, review documents, and labeling. The FDA also performs research regarding the products it regulates. Some examples of this research include research related to the establishment of standards and methods, toxicology, product safety, and basic mechanisms of actions or pathogenesis. This research advances FDA’s mission and is important for quality review of submissions, development of new policy and guidance, providing advice on product development and product safety.

DRUG AND BIOLOGICAL PRODUCT LIFE CYCLE The life cycle for new drug and biological products is divided into four stages: discovery/nonclinical investigation, clinical trials, marketing application/licensure, and postapproval.

Discovery/Nonclinical Investigation The earliest stage of product development involves the discovery and initial evaluation of an active moiety. In this period of drug development a production process sufficient to yield a consistent quality, clinical-grade material is required so that the drug product is adequately characterized. Tests and assays to characterize the product should be under development in this stage because they will be necessary to link the product to the outcome of animal or human clinical trials. At this time, the sponsor conducts animal safety studies to determine an appropriate starting dose in humans and to establish an initial toxicity profile for the product. These studies will assist in designing the first-in-human clinical trial to help ensure that the human participants are properly monitored for potential adverse events. This is the stage in which the biological rationale for the use of the product is proposed. If an animal efficacy model exists, studies in that model also should be performed to support the use of the product in humans. The FDA has developed a number of guidance documents on considerations in product development and nonclinical animal studies to help sponsors develop the necessary data to support an Investigational New Drug (IND) submission.

Clinical Trials The FD&C Act and the Public Health Service Act require that a new drug or biological product be approved before it can enter interstate commerce. Under

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

DRUG AND BIOLOGICAL PRODUCT LIFE CYCLE

its rulemaking authority, the FDA issued regulations found in 21 CFR Part 312, allowing an exception from the approval requirement for drugs and biologics for which an IND application is in effect. These regulations allow investigational products to be legally shipped in interstate commerce to conduct clinical investigations. A number of FDA guidance documents are applicable to the conduct of clinical trials. For example, FDA’s E6 Guidance for Industry on Good Clinical Practice is a harmonized guidance document developed as part of the International Council on Harmonisation (ICH) and provides a unified standard for the United States, the European Union, Japan, Canada, and Switzerland to facilitate the mutual acceptance of clinical data by the regulatory authorities in these jurisdictions. The term good clinical practice (GCP) refers to the design, conduct, recording, evaluating, monitoring, and reporting of clinical trials. The principles of GCP are provided in Table 6.5. During the clinical development of a product under an IND, additional product process development and testing/validation are performed. Also, additional nonclinical information is obtained regarding the safety and efficacy of the product. There are generally three phases of premarketing clinical research to examine the safety and efficacy of a drug or biological product in a “learn and confirm” model. Phase 1 trials include “first-in-human” studies that are normally small, dose-escalation trials that may include patients with a particular condition or normal volunteers, with the primary goal of assessing safety of the product using a particular route of administration. Phase 1 trials also examine the pharmacokinetics and metabolism of the investigational drug, which can include drugedrug interactions and food effect studies. The primary goal of these studies is to provide preliminary evidence of safety and dosing. Preliminary evidence of efficacy may be observed in Phase 1 studies but it is not the primary purpose. Phase 2 “proof of concept” trials consist of one or more moderately sized clinical trials for a particular patient population. Phase 2 trials are typically larger than Phase 1 studies and are designed to evaluate the effectiveness of a product for a particular indication in a patient population with the disease being studied. The primary purpose of Phase 2 trials is to detect efficacy and optimize dosing although safety information is continually collected and assessed. Phase 3 trials are often much larger trials that are designed to evaluate the benefits and risks of a product in a patient population with a defined clinical indication. The safety and efficacy data from these trials are generated to support marketing approval and to provide information to write the instructions for the use of the product for a particular indication. Some key issues for the design, conduct, and analysis of Phase 3 clinical

TABLE 6.5

77

Principles of Good Clinical Practice

• Clinical trials should be conducted in accordance with the ethical principles that have their origin in the declaration of helsinki, and that are consistent with good clinical practice and the applicable regulatory requirement(s). • Before a trial is initiated, foreseeable risks and inconveniences should be weighed against the anticipated benefit for the individual trial subject and society. A trial should be initiated and continued only if the anticipated benefits justify the risks. • The rights, safety, and well-being of the trial subjects are the most important considerations and should prevail over the interests of science and society. • The available nonclinical and clinical information on an investigational product should be adequate to support the proposed clinical trial. • Clinical trials should be scientifically sound and described in a clear, detailed protocol. • A trial should be conducted in compliance with the protocol that has received prior institutional review board/independent ethics committee approval/favorable opinion. • The medical care given to, and medical decisions made on behalf of, subjects should always be the responsibility of a qualified physician or, when appropriate, of a qualified dentist. • Each individual involved in conducting a trial should be qualified by education, training, and experience to perform his or her respective tasks. • Freely given informed consent should be obtained from every subject prior to clinical trial participation. • All clinical trial information should be recorded, handled, and stored in a way that allows its accurate reporting, interpretation, and verification. • The confidentiality of records that could identify subjects should be protected, respecting the privacy and confidentiality rules in accordance with the applicable regulatory requirement(s). • Investigational products should be manufactured, handled, and stored in accordance with applicable good manufacturing practice. They should be used in accordance with the approved protocol. • Systems with procedures that assure the quality of every aspect of the trial should be implemented. From ICH harmonized tripartite guideline for good clinical practice, Step 4, 1996 ICH secretariat c/o IFPMA, Geneva, Switzerland.

trials include the primary and secondary end points, trial population, randomization, stratification, blinding, sample size, participant adherence, and statistical analysis. It is important to note that the clinical development of a product may not proceed in a linear fashion from Phase 1 to Phase 3 trials, and a product may be simultaneous in all stages of clinical trial development for one or more indications. The content and format of the IND application is specified in the FDA regulations at 21 CFR 312.23. The IND application should include the following: a table of contents; an introductory statement including the

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

78

6. DRUG AND BIOLOGICS REGULATION

rationale for the drug or the research study and general investigational plan; chemistry, manufacturing, and control (CMC) information; pharmacology and toxicology information; previous human experience with the investigational product and other relevant information; protocols; and the investigator’s brochure (IB). Once the original IND is submitted, the FDA has 30 days to review and notify the submitter or sponsor whether the trial may proceed or has been placed on clinical hold. During those 30 days, the sponsor may not initiate the clinical trial. The IND review is aimed primarily at an evaluation of the safety of the product for human clinical trials. The IND is allowed to proceed if the agency has no safety concerns or if the sponsor does not hear from the FDA within 30 days. In contrast, a clinical hold notice is issued to the sponsor that the proposed clinical trial(s) may not begin (or an ongoing clinical trial is suspended) until certain stated deficiencies are resolved if there are safety concerns. Phase 1 trials may be placed on clinical hold for any of the following five reasons: 1. Human subjects are or would be exposed to an unreasonable and significant risk of illness or injury. 2. The clinical investigators are not qualified to conduct the study. 3. The IB is misleading, erroneous, or materially incomplete. 4. The IND does not contain sufficient information to assess the risks to subjects. 5. The IND is for the study of a life-threatening disease or condition that affects both genders, and men or women with reproductive potential who have the disease or condition being studied are excluded from eligibility (see Chapter 13). Once an IND is established, new studies may be initiated under that IND with submission of the protocol to the IND without a 30-day waiting period. Nevertheless, Phase 2 and 3 trials may be placed on clinical hold for any of the previously discussed reasons. They also may be placed on hold if the plan or protocol for the trial is clearly deficient in design to meet its stated objectives. If an IND is placed on clinical hold, the sponsor is notified by telephone or other means of rapid communication or in writing. This notification is followed with a letter that specifically states the deficiencies. Advice is available from FDA on appropriate corrective actions. It is then up to the sponsor to correct the deficiencies and notify the FDA of the corrections in a clinical hold response letter. Once the sponsor submits a complete response to the clinical hold, the FDA generally responds in writing within 30 calendar days to the clinical hold response letter. There is no automatic release from clinical hold. If the sponsor does not hear from

the FDA in 30 calendar days the clinical trial may not start. When FDA’s review of the clinical hold response is completed, the sponsor is notified that the trial(s) may proceed or that there are continuing deficiencies and the clinical hold is retained. A clinical investigation of a drug or biological product that is lawfully marketed in the United States may be exempt from the IND requirements set forth in the regulations. A clinical investigation is exempt if all of the following apply (21 CFR 312.2): 1. The investigation is not intended to be reported to FDA as a well-controlled study in support of a new indication for use nor intended to be used to support any other significant change in the labeling for the drug. 2. If the drug that is undergoing investigation is lawfully marketed as a prescription drug product, the investigation is not intended to support a significant change in the advertising for the product. 3. The investigation does not involve a route of administration or dosage level or use in a patient population or other factor that significantly increases the risks (or decreases the acceptability of the risks) associated with the use of the drug product. 4. The investigation is conducted in compliance with the requirements for institutional review set forth in 21 CFR Part 56 and with the requirements for informed consent set forth in 21 CFR Part 50. 5. The investigation is conducted in compliance with the requirements that a sponsor or investigator, or any person acting on behalf of a sponsor or investigator, (a) must not represent in a promotional context that an investigational product is safe or effective for the purposes for which it is under investigation or otherwise promote the drug, (b) must not commercially distribute or test market an investigational new drug, and (c) must not unduly prolong an investigation after finding that the results appear to establish sufficient data to support an application. FDA regulations require the sponsor to file an amendment to the IND if certain changes are made to the product, the nonclinical studies, or the clinical protocol. These include changes in product formulation and changes that affect the safety, scope, and scientific quality of the clinical protocol, including its data and analyses, or the addition of a new protocol. The sponsor also must file an annual report that includes, among other things, all changes in and results of the study. A sponsor may request to meet with FDA for advice on product development throughout the development life cycle. Often sponsors meet with the agency at the end of the nonclinical/discovery stage to discuss their data and their future plans prior to submitting an IND

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

DRUG AND BIOLOGICAL PRODUCT LIFE CYCLE

application. This meeting is referred to as a pre-IND meeting and may be very important in facilitating a successful IND. It is possible to request an end-of-phase-1 meeting to discuss the data obtained in a Phase 1 trial and the drug development plan, including selection of doses and patient populations in Phase 2 trials. Frequently, the sponsor meets with the FDA at the end of Phase 2 trials to discuss the outcomes of the trials as well as the design and analysis plan for the Phase 3 trials. At these end-of-phase-2 meetings, the sponsor and FDA frequently discuss the key end points, the specific patient population, and the statistical analysis plan that will provide a statistically significant and clinically meaningful result from the trial(s). Following completion of the Phase 3 or “pivotal” trials, the sponsor again may meet with the FDA to discuss a marketing application submission. These pre-BLA (Biologics License Application) or pre-NDA (New Drug Application) meetings focus on the content and format of a marketing application. Several mechanisms are available that accelerate the drug development process for life-threatening and severely debilitating illnesses, such as fast-track designation, breakthrough therapy designation, priority review (21 CFR 312 Subpart E), and accelerated approval (21 CFR 314.510 and 601.41). The purpose of these programs is to facilitate the development and expedite the review of new drug and biological products that are intended to treat serious and life-threatening conditions and that demonstrate the potential to address unmet medical needs. FDA issued a comprehensive guidance document in 2014 describing the qualifying criteria and features of each of these four programs (http:// www.fda.gov/downloads/drugs/guidancecompliance regulatoryinformation/guidances/ucm358301.pdf). The fast-track program offers actions to expedite development and review, including a rolling review of a marketing application. Breakthrough therapy designation also offers actions to expedite review, including intensive guidance from the FDA on drug development, an organizational commitment from the FDA to involve senior managers and experienced review and regulatory health project management staff in a proactive, collaborative, cross-disciplinary review, and rolling review of a marketing application. Priority review offers a 4-month shorter clock for review of an application. Finally, accelerated approval (21 CFR 314.510 and 601.41) is an FDA approval based on a surrogate end point that is reasonably likely to predict clinical benefit or based on a clinical end point other than survival or irreversible morbidity that is also likely to predict ultimate clinical benefit. Applicants of products approved under this pathway have been required to conduct Phase 4 (postmarketing) trial(s) to confirm and/or verify clinical benefit.

79

There are also a number of expanded access programs that are available for patients with a serious or immediately life-threatening disease under an IND (21 CFR 312 Subpart I). The three categories of expanded access uses are for individual patients (including emergency use), for intermediate-sized populations, and for large populations under a treatment IND or protocol. For all expanded access uses, FDA must determine that the patient has no comparable or satisfactory alternative therapy to treat the disease or condition. The FDA also must determine that the potential patient benefit justifies the potential risks and those risks are not unreasonable in the context of the disease or condition to be treated, and providing the drug for the requested use will not interfere with the initiation, conduct, or completion of clinical investigations that could support marketing approval of the expanded access use or otherwise compromise the potential development of that use (http://www.fda.gov/downloads/drugs/guidance complianceregulatoryinformation/guidances/ ucm351261.pdf).

Responsibilities and Documentation Sponsors Several groups, including the sponsors, investigators, Institutional Review Boards (IRBs), and the FDA, have responsibilities in clinical research that are described in the regulations and guidance documents. The responsibilities of the sponsor are found in FDA’s regulations at Subpart D of 21 CFR Part 312. The sponsor, generally the developer of the product, is the person or entity who submits the IND. The sponsor is responsible for selecting qualified investigators and providing them the necessary information to conduct the study properly. The sponsor also is responsible for the trial design, the trial management, data handling and record keeping, allocation of responsibilities, compensation to subjects and investigators, financing, and notification/submission to regulatory authorities (e.g., protocol submission). In addition, the sponsor is required to ensure that there is proper monitoring of the study and that it is conducted in accordance with the general investigational plan and protocols contained in the IND. The sponsor must ensure that all participating investigators and the FDA are promptly informed of significant new adverse effects or risks with respect to the product. The sponsor also is responsible for the quality assurance and quality control of the trial. Finally, the sponsor is accountable for maintaining and making available, as necessary, the information on the investigational product, including the manufacture of the product, supplying and handling the investigational product, record access, and safety information. A sponsor may transfer responsibility for

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

80

6. DRUG AND BIOLOGICS REGULATION

any or all of its obligations to a contract research organization; however, the sponsor is ultimately responsible for the quality and integrity of the trial. Investigators The sponsor is responsible for selecting qualified investigators. Investigators have multiple responsibilities, including following the protocol for the study and complying with all applicable regulations. It is their responsibility to protect the rights, safety, and welfare of subjects in their care. As part of the responsibility for protection of human subjects, an investigator must not involve a human being as a subject in research unless the investigator has obtained the legally effective informed consent of the subject or the subject’s legally authorized representative. In doing so, the investigator must assure that there is sufficient opportunity for the subject or the representative to consider whether or not to participate. The explanation of the study must be in language that the subject can be understood and presented in a manner that minimizes the possibility of coercion or undue influence. The consent form must not contain exculpatory language through which the subject or representation is made to waive or appear to waive the subject’s legal rights. The investigators must retain control of the investigational product and maintain records of the disposition of the product, records, and reports (e.g., progress and final reports, safety reports), case histories of the subjects, and termination or suspension of the trial. Investigators are required to report observed serious adverse events to the sponsor and to record nonserious adverse events and report them to the sponsor according to the timetable for reporting specified in the protocol. In addition, they are required to report to the IRB all changes in the research activity and all unanticipated problems involving risk to subjects or others. Investigators also must arrange for review of the IND protocols by the IRB and other communications with the IRB. Because of concerns of potential bias, they are required to supply sponsors with sufficient accurate financial information to allow the sponsor to report on financial interest to the FDA. If the sponsor and the investigator are the same individual, then that individual must carry out all of the responsibilities of the sponsor and investigator with appropriate safeguards or contracting arrangements to ensure the integrity of the trial and human subject safety. Clinical Protocol The clinical trial protocol and its amendments are critical elements of clinical research. The protocol should include general information, such as title, protocol number, names of sponsors, investigators, and

background information. The background information should include the name and description of the investigational drug product, nonclinical studies that impact the clinical trial, the population to be studied, known or possible risks and benefits to human subjects, and administrative information. The protocol should state the objectives and purpose of the trial, the trial design, the selection and withdrawal of subjects, the treatment of subjects, the assessment of efficacy/activity (where appropriate) and safety, and the statistical evaluation plan (where appropriate). It also should address the plan for quality control, monitoring and assurance, data handling, record keeping, and ethical considerations. A more detailed treatment of this subject may be found in Chapter 16. A special protocol assessment may be requested by an IND sponsor to evaluate the adequacy of certain proposed studies associated with drug development (see http://www.fda.gov/downloads/Drugs/Guidance ComplianceRegulatoryInformation/Guidances/UCM08 0571.pdf). Three types of protocols are eligible: animal carcinogenicity, final product stability, and Phase 3 trials whose data will form the primary basis for determination of efficacy. The submission of a clinical trial protocol should include the statistical analysis plan. The FDA has a 45-day review clock to respond to the submission with either a letter of agreement or suggested revisions to the protocol. Documented special protocol agreements are considered binding except when the sponsor does not follow the protocol, modifies the protocol without concurrence by FDA, or if there is a material change in the science. Institutional Review Board The constitution and responsibilities of the IRB are covered by the regulations in Part 56 of Title 21 of the CFR. The IRB is charged with reviewing and approving protocols that are to be carried out in the organization(s) that it serves. As described in Chapter 4, it is the IRB’s function to ensure that in each protocol the risks to human subjects are minimized and reasonable in relation to anticipated benefits, if any, to subjects, and the importance of the knowledge that may be expected to result. IRBs must assure that the selection of subjects is equitable and that informed consent is sought and appropriately documented. The regulations require that the IRB has at least five members with varying backgrounds to promote complete and adequate review of research activities at the institution(s). The IRB must have at least one member whose primary concerns are scientific, another whose primary concerns are nonscientific, and at least one member not otherwise affiliated with the institution. Chapter 4 provides a detailed explanation of the structure and function of the IRB.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

DRUG AND BIOLOGICAL PRODUCT LIFE CYCLE

Food and Drug Administration The FDA reviews all INDs and their amendments to determine whether they are in compliance with the appropriate laws and regulations. The regulations establish time frames for the performance for certain reviews and lay out the responsibilities of the FDA in communicating with the sponsors. The primary purpose of the review of the original IND submission and early amendments is to help assure that human subjects are not exposed to unreasonable risk. In the later phases of the IND process involving trials to support efficacy determinations, the FDA reviews also focus on whether the studies are constructed and carried out in a way that will yield valid data that can be considered for marketing approval. The FDA also interacts with sponsors through meetings and conference calls, starting at the pre-IND stage and continuing throughout the entire IND process, to address important product development, clinical trial design and analysis, and premarket submission issues. Investigator Brochure If the sponsor is not the investigator, there must be an IB. It is the sponsor’s responsibility to maintain and update the IB and give it to the investigators who are conducting the trial. This document generally includes information regarding the clinical and nonclinical data on the investigational product that are relevant to the use of the product in human subjects. Investigational New Drug Safety Reports Sponsors should submit IND safety reports to the FDA as described in 21 CFR 312.32 and 312.33. The reporting requirements for adverse events include expedited reports that consist of written reports and annual reports or information amendments. IND safety reports include any suspected adverse reaction that is both serious and unexpected, or any findings from other studies that suggest a significant risk in humans exposed to the drug, or tests in laboratory animals that suggest a significant risk in humans exposed to the drug, including reports of mutagenicity, teratogenicity, or carcinogenicity. A serious adverse event or serious suspected adverse reaction is one that, in the view of either the investigator or sponsor, results in any of the following outcomes: death, a life-threatening adverse event, inpatient hospitalization or prolongation of existing hospitalization, a persistent or significant incapacity or substantial disruption of the ability to conduct normal life functions, or a congenital anomaly/birth defect. A life-threatening adverse event or life-threatening suspected adverse reaction is one that places the subject in the view of either the investigator or sponsor at immediate risk of death. The sponsor must notify the FDA and

81

all participating investigators as soon as possible, but not later than 15 calendar days, after the sponsor determines that the information qualifies for reporting in an IND safety report. The sponsor also shall notify the FDA of any unexpected fatal or life-threatening suspected adverse reaction as soon as possible, but not later than seven calendar days after the sponsor’s initial receipt of the information.

Marketing Approval/Licensure Section 351 of the Public Health Service Act requires that a biologics license be in effect for any biological product that is to be introduced into interstate commerce. The FD&C Act requires approval of a marketing application (NDA) for new drugs to be introduced into interstate commerce. The provisions of the IND regulations allow interstate transportation of drugs and biological products for clinical investigations. These investigations are intended to provide data to support a BLA or an NDA. Pre-New Drug Application/Biologics License Application Submission Although the IND phase is primarily directed at the collection of nonclinical data to support the safety of the clinical investigations, as well as clinical data, during this time much of the CMC information needed for a marketing application also is being developed. The formulation to be marketed should be identified and ideally should be used for the pivotal clinical trials. If the to-be-marketed formulation of the product differs from that used in the pivotal clinical trials, the sponsor will need to provide data to “bridge” the formulations. The product must be adequately characterized, and its stability demonstrated. Consistency of manufacture also must be proven. Although the specific approaches to the development of these data vary with the product area, there are a number of guidance documents available that provide insight into what information is important and how the information might be generated. During the pre-IND and IND stages, it is important that the potential applicant remain in contact with the FDA. It is far easier to address concerns, including both clinical trial and CMC issues, before the clinical protocol is under way. It is in the best interest of both the FDA and the sponsor to work out these details so that when the time comes for a marketing application to be submitted, there are no unexpected problems. After the sponsor compiles sufficient information, the sponsor will begin to plan the submission of the NDA or BLA. The FDA recognizes the value of pre-NDA/BLA meetings, and encourages sponsors to schedule a pre-NDA/BLA meeting well in advance of any planned

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

82

6. DRUG AND BIOLOGICS REGULATION

submission of an application. This meeting provides a forum for discussing the content, format, and timing of the proposed submission. While the sponsor is preparing to submit a BLA or NDA, the FDA is preparing to review it. A multidisciplinary review team is formed, and preliminary decisions concerning the handling of the submission are made. One of the first decisions is whether the review of the product should be handled under a standard review schedule or as a priority review. The standard and priority review schedules are based on goals agreed to and in conjunction with the PDUFA. Currently, the standard schedule for a new molecular entity requires a complete review in 10 months, whereas a priority review is to be completed in 6 months from the date of filing of the BLA or NDA. The review schedule decision is based on the use of the product (for severe or life-threatening illnesses) and whether it fills an unmet medical need. At this time, the review team also will decide which clinical study sites should be inspected and requests a bioresearch monitoring inspection. This inspection is focused on the verification of the data that are submitted to the FDA. The field investigators will help determine whether the studies were carried out according to regulations and appropriate informed consent was obtained. They also review the record keeping for compliance with the regulatory record keeping requirements and to determine whether protocols were followed. If deviations are observed, the field investigators will present the firm with a list of observations (FDA Form 483). The FDA Form 483 is considered, along with a written report called an Establishment Inspection Report, all evidence or documentation collected on-site, and any responses made by the company. The FDA considers all of this information and then determines what further action, if any, is appropriate to protect public health. The report of the bioresearch monitoring inspection is a key piece of the review of a BLA or NDA. Application The regulations prescribing the content of a BLA are found in 21 CFR 601.2 and those for the NDA in 21 CFR 314.50. The BLA/NDA must contain a signed cover sheet and the Form FDA 356h; this form provides information that enables the center to identify the type of submission, the applicant, and the reason for the submission. The bulk of the BLA/NDA submissions generally consist of nonclinical and clinical study reports that the applicant believes provide data supporting the safety and efficacy of the product. The applicant also must submit the proposed labeling for the product, which must be supported by the data. The BLA/NDA also must contain adequate CMC information to ensure that the product meets standards of

purity and potency. These data will include information on characterization, stability, the manufacturing process, and the facility in which the manufacturing is carried out. In the BLA/NDA, applicants include a statement about the nonclinical studies used to support the application being conducted in compliance with regulations on good laboratory practice for nonclinical laboratory studies (Part 58 of Title 21 CFR). If the studies were not conducted according to good laboratory practice, the applicant must explain why they were not. The applicant must certify that all clinical studies were conducted in compliance with the informed consent regulations in Part 50 of Title 21 of the CFR, and that each clinical study either was conducted in compliance with the IRB regulations in Part 56 or was not subject to those regulations. In addition, Part 54 of Title 21 of the CFR requires the submission of a financial certification or disclosure statement or both, for clinical investigators who conducted clinical studies submitted in the application. Every BLA/NDA also must include a statement regarding the effect of the product on the environment. Depending on the specific facts, the sponsor must provide either a claim for categorical exclusion or an environmental assessment. Under current regulations, most drug and biologic marketing applications are categorically excluded from the need to supply an environmental assessment; however, there are certain categories of products and processes that still require such an assessment. Sponsors should become aware of the need for an assessment during the IND process. Food and Drug Administration Review The receipt of the BLA/NDA at the FDA starts the “review clock.” The review team consists of the experts necessary to conduct a review of the submission. Generally, the team contains specialists in clinical and nonclinical data review, product area specialists, specialists in good manufacturing processes, biostatisticians, and a regulatory project manager. Reviewers in other specialty areas are added to the review team as necessary. The initial review of the BLA/NDA focuses on the suitability of the application for filing. If the application is significantly deficientdthat is, it lacks information necessary to permit a substantive reviewdthe FDA may refuse to file it. A “refuse to file” action terminates the review of that application. Although an applicant may elect to file over protest, the refuse to file action indicates a severely deficient submission that is unlikely to lead to an approval in the first review cycle. If the BLA/NDA is complete, the FDA files it and the substantive review of the application begins in earnest.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

DRUG AND BIOLOGICAL PRODUCT LIFE CYCLE

It is not uncommon for questions to arise during the review, for which the review team may send an “information request” to the applicant requesting further explanation or additional data. The responses to these are expected to be answered in a relatively rapid time frame to facilitate the review. As each discipline finishes its particular review, it prepares a review memo documenting what has been reviewed and any deficiencies that have been found. Inspections are part of the complete review of a BLA/NDA. One type of inspection is the bioresearch monitoring inspection mentioned previously. This inspection helps provide assurance that the review team can rely on the clinical data submitted to support the safety and efficacy of the product. The other inspection is a facility inspection in which product specialists and specialists in good manufacturing practice visit the manufacturing facilities. This inspection is aimed at assessing whether the product is made under appropriate conditions and the process for manufacture has been validated and is being followed. All aspects of the manufacture of the product are investigated during this inspection. The applicant is made aware of any significant observations at the end of the inspection. The inspectors complete an inspection report that becomes part of the review of the application. CBER/CDER sometimes present issues raised in the review of the application to an external advisory committee made up of experts in the disease as well as a patient and consumer representative. The use of an advisory committee allows the review team to bring specific questions or concerns to a broader forum of experts. For specific questions, FDA may include additional experts as part of the advisory committee meeting to provide advice in a particular area of concern. Not all BLAs/NDAs are presented at an advisory committee. A BLA/NDA may be presented if the product is a new molecular entity or if the review team has identified particular issues on which they need expert input. A critical part of the review process is the evaluation of the proposed labeling for the product. It is important that statements made in the labeling be supported by data. The ultimate goal of the review of the proposed labeling is to ensure that it clearly identifies the product and provides adequate information to allow the safe and appropriate use of the product. Patient labeling, when included, must be both clear and accurate, so that the patient will understand how to use the product properly. The review team will work with the applicant to obtain accurate and informative labeling. After the inspection reports are evaluated, the reviews completed, and any advisory committee

83

advice is considered, the review team makes a recommendation on the BLA/NDA and the division or office director with delegated signatory authority decides on the appropriate action. If the application is approved, the FDA issues a letter that serves as a license (BLA) or an approval (NDA), allowing the applicant to introduce the product into interstate commerce. If the review has resulted in questions or concerns, the FDA issues a “complete response letter.” This letter explains that the application cannot be approved and identifies all of the deficiencies that must be addressed to put the application in condition for approval. When the applicant responds to this letter, the review clock and the review begin again. The FDA publishes its approval letters, the approved labeling, and reviews on the website [email protected] (http://www.accessdata. fda.gov/scripts/cder/drugsatfda/index.cfm). With rare exceptions, however, any complete response letters and reviews of unapproved applications remain confidential.

Postapproval Following marketing approval, the FDA is responsible for the review of changes to the NDA or BLA, including manufacturing changes, labeling changes, and new clinical indications, for the lifetime of the product. These changes must be submitted as supplements to the BLA or NDA. Supplements are reviewed and approved (or not) according to the timelines described by the PDUFA. There are two types of postmarket studies from a regulatory perspective: postmarketing commitments (PMC) and postmarketing requirements (PMR). PMRs include studies that sponsors are required to conduct under one or more statutes or regulations. These may include safety studies to assess a known serious risk or efficacy studies conducted to affirm clinical benefit for a product that was received accelerated approval based on a surrogate end point. In addition, pediatric studies can be required under the Pediatric Research Equity Act. PMCs are studies or clinical trials that a sponsor has agreed to conduct at the time of approval, but that are not required by a statute or regulation. Adverse events must be reported according to 21 CFR 600.80 for biological products and 21 CFR 314.80 for drug products. Postmarketing 15-day “alert reports” are submitted for adverse events that are both serious and unexpected as soon as possible but not later than 15 calendar days of initial receipt of the information by the applicant. These are generally reported through MedWatch for drugs and nonvaccine

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

84

6. DRUG AND BIOLOGICS REGULATION

biological products (http://www.fda.gov/safety/ medwatch/default.htm) or the Vaccine Adverse Events Reporting System for vaccines (https://vaers.hhs.gov/ index).

product is no longer safe or effective. The FDA also can pursue injunctive relief and criminal sanctions for violations when warranted.

SUMMARY

COMPLIANCE The FDA has the authority to disqualify clinical investigators from conducting clinical testing of new drugs and devices when the agency determines that the investigator has repeatedly or deliberately failed to comply with the requirements intended to protect study subjects and ensure data integrity. The FDA also can disqualify an investigator who has repeatedly or deliberately submitted false information to the agency or study sponsor in a required report. If disqualified, the investigator may no longer receive investigational products and will be ineligible to conduct any clinical investigation that supports an application for a research or marketing permit for products regulated by the FDA. The FDA reviews any marketing application that relies on data from studies performed by the disqualified investigator to determine whether the investigator submitted unreliable data that are essential to the continuation of an investigation or to the approval of a marketing application, or essential to the continued marketing of an FDA-regulated product. Depending on that determination, the FDA may decide that the investigation may not be considered in support of a research or marketing application or may withdraw approval of the product. Under its statutory debarment authority, the FDA also may ban or “debar” individuals and companies convicted of certain felonies or misdemeanors related to drug products. Once individuals have been subjected to debarment, they may no longer work for anyone with an approved or pending drug product application at the FDA, and the FDA will not accept or review NDAs submitted by debarred individuals. Even though this statutory authority was granted in the Generic Drug Enforcement Act of 1992, it applies to both innovator and generic drug manufacturers. Debarred companies may no longer submit, or assist others in submitting, NDAs. Following the approval of a product, the FDA performs biennial inspections to assess the firm’s compliance with current good manufacturing practice (21 CFR 210, 211 and 600e680). The FDA evaluates any observations listed in an FDA Form 483 and determines whether further regulatory action is needed. If the deficiencies are severe, the FDA can take appropriate regulatory actions, including steps to revoke the license (BLA) or withdraw approval (NDA) if FDA believes the

The FDA plays a vital regulatory role in the conduct of clinical drug research and drug product development. The regulations promulgated by the FDA are intended to help ensure human subject protection and data integrity. Even when a specific clinical investigation is exempt from the FDA’s IND requirements, it must be conducted in compliance with the requirements for IRB review (21 CFR part 56) and informed consent (21 CFR part 50). Thus, the principles of human subject protection must be maintained. A central role of the FDA is also to assist in the expeditious development of safe, effective, high-quality drugs for the US population. Thus, there are numerous programs in place to incentivize and expedite drug development for indications with a significant unmet medical need.

SUMMARY QUESTIONS 1. An investigator is responsible for: a. following the clinical protocol b. maintaining records of product disposition and case histories of subjects c. reporting adverse events d. protecting human subjects e. reporting financial interests f. all of the above 2. An IND may be placed on Clinical Hold if: a. Human subjects would be exposed to unreasonable risk of illness or injury b. A study of a life-threatening disease includes women with reproductive potential c. The investigator’s brochure is inadequate d. a and b e. a and c f. b and c 3. The FDA: a. will not meet with the sponsor during the IND phase b. will not inspect the clinical trial sites c. has programs to facilitate the development of new drugs and biologics d. has no time frame for review of a new IND e. publishes guidance documents that must be followed f. none of the above

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

SUMMARY QUESTIONS

4. The FDA has multiple expedited programs to accelerate development and review of new drugs and biologics. These programs include: a. Accelerated approval and priority review b. Accelerated approval and expanded access c. Breakthrough therapy and orphan product designation d. Breakthrough therapy and fast-track designation e. b and d f. a and c g. a and d

85

5. Principles of Good Clinical Practices include a. The rights, safety, and well-being of trial subjects b. Adequate nonclinical and clinical information on the investigational product c. Systems with procedures that assure the quality of every aspect of the trial d. a and b e. a and c f. a, b, and c

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

7 International Regulation of Drugs and Biological Products Theresa Mullin U.S. Food and Drug Administration, Silver Spring, MD, United States

O U T L I N E Introduction

87

Background Early Operations and Achievements of International Conference on Harmonisation Recent Evolution and Reforms Membership in the New International Council on Harmonisation Organization of the New International Council on Harmonisation Financing the New International Council on Harmonisation

88

Overview of the International Council on Harmonisation Technical Harmonization Process Nomination and Selection of Topics for Harmonization International Council on Harmonisation Five-Step Harmonization Procedure

88 91

93 94

International Council on Harmonisation Guidelines Most Relevant to Clinical Research

95

92

Future Work in Regulatory Harmonization

98

92

References

98

91

INTRODUCTION

of requirements across member states. The World Health Organization (WHO), an agency of the United Nations concerned with international public health, has established medicinal, clinical, and technical standards and promotes regulatory capacity building, training, and work sharing among regulatory authorities, including a biennial meeting of its International Conference of Drug Regulatory Authorities (ICDRA). The Pharmaceutical Inspection Convention and Pharmaceutical Inspection Cooperation Scheme (PIC/S) are two international instruments between countries and pharmaceutical inspection authorities intended to serve as a means to improve cooperation in the oversight of Good Manufacturing Practices between regulatory authorities. In addition to these global efforts, there are a number of regional harmonization initiatives. In the Asia Pacific region, the Asia-Pacific Economic Cooperation (APEC) and the Association of Southeast Asian Nations

The global drug regulatory environment is characterized by regulatory oversight conducted at the national and regional level by authorities, such as the US Food and Drug Administration (FDA) and regional health initiatives, such as the European Union (EU) European Commission (EC). The laws that govern each of these national or regional entities generally determine the specific regulatory requirements in each jurisdiction. However, recognizing the increasingly global nature of industry operations, from preclinical development through finished product manufacturing, drug regulatory authorities have undertaken a variety of efforts to harmonize regulatory requirements and cooperate in efforts to oversee drug industry operations. Within the EU, for example, the European Medicines Agency and the EC oversee a centralized process and harmonization

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00007-1

93

87

Copyright © 2018. Published by Elsevier Inc.

88

7. INTERNATIONAL REGULATION OF DRUGS

(ASEAN) have been established as venues for cooperation and regulatory harmonization. In addition, the Asian Economic Community (AEC) conducts regional harmonization of technical standards and regulatory requirements under the Pharmaceutical Product Working Group (PPWG). In South America, the Pan American Health Organization (PAHO) component of the WHO operates in collaboration with the Pan American Network for Drug Regulatory Harmonization (PANDRH). In Africa, the South African Development Community (SADC) works to strengthen regulatory capacity among its member countries, promote harmonized standards for pharmaceutical development and promote harmonized standards for treatment as well as access to essential medicines. The East African Medicines Regulatory Harmonization Program (EAC-MRH) also works to harmonize medicines regulation systems and procedures within the East African Community (EAC) in accordance with national and international policies and standards. In the Middle East, the Gulf Central Committee for Drug Registration (GCC-DR) similarly works to harmonize regional standards for drug marketing approval. Another major harmonization effort distinguished by its collaboration of technical experts drawn from both regulatory agencies and pharmaceutical industry, undertaken by drug regulators working with pharmaceutical industry stakeholders, has been the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (hereafter referred to as ICH), which was first established in 1990. ICH has been the primary engine for development of many of the harmonized technical standards that have been taken up and adopted and used as a basis for training by many of the global and regional harmonization efforts previously mentioned. This chapter will focus on the operations and contributions of ICH to support international regulation and the development of safe and effective medicines. It provides background on the establishment and earlier operations of ICH and describes more recent reforms to its structure and governance. It also provides the reader with an overview of the ICH technical harmonization process and finally provides a more detailed review of the guidelines primarily related to the planning and oversight of clinical research to inform regulatory decision-making.

BACKGROUND Early Operations and Achievements of International Conference on Harmonisation The development of new requirements by drug regulatory authorities in the major pharmaceutical markets in the United States, European Union, and

Japan in the 1970s and 1980s, related to demonstration of new drug efficacy in addition to the requirements for safety, resulted in better evidence generated by more comprehensive and often more complex drug development programs. This very positive public health impact also was accompanied by a less desirable increase in the time and cost of drug development. In addition, drug companies often found that differences in specific requirements related to drug safety, efficacy, and quality set by regulators in different regions of the world created extensive duplication of effort that further contributed to costs and delays in getting new drugs to patients. These economic and public health impacts provided key motivators to establish a process for international harmonization of regulatory standards for the marketing or registration of new drugs. The first meeting to explore the potential for international regulatory harmonization, held in Europe in 1990, was attended by representatives of regulatory authorities and the pharmaceutical industry from the United States, European Union, and Japan. This first meeting was to plan an international conference on harmonization but the meeting also discussed the wider implications and terms of reference of an ICH. At the first ICH Steering Committee (SC) meeting the Terms of Reference were agreed, and it was decided that the topics selected for harmonization would be divided into Safety (S), Quality (Q), and Efficacy (E) to reflect the three criteria, which are the basis for approving and authorizing new medicinal products. A fourth category of Multidisciplinary (M) guidelines emerged as it was determined that some needed areas for standard harmonization addressed more than one of the original three S, Q, and E categories, and a number of guidelines were identified as needed to address electronic standards for regulatory submission of new drug dossiers. The mission of the ICH established in 1990 was and remains to promote public health by making recommendations to achieve greater harmonization in the interpretation and application of technical guidelines and requirements for pharmaceutical product registration. The harmonization of these regulatory standards is considered to offer direct benefit to both regulatory authorities and to regulated industry. Major benefits commonly cited include the prevention of duplication of clinical trials in humans and more consistent protection of human subjects in clinical trials, accomplished primarily through the E guidelines. Another major benefit is considered to be the minimization of the use of animal studies without compromising drug safety and effectiveness, accomplished primarily through the S guidelines. ICH harmonization guidelines have been credited with streamlining the regulatory assessment process for new drug applications and producing

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

89

BACKGROUND

a combined impact of reducing the development times and resources needed for global drug development. Key elements contributing to the success of ICH include the involvement of both regulators and industry parties in the detailed technical harmonization work and the application of a science-based approach to harmonization through a consensus-driven process. That also is clearly outlined and closely managed by a senior governance body in ICH. The technical harmonization work is conducted by experts with comparable levels of expertise in a given topic area drawn from both regulatory agencies and the drug industry. In addition, the final steps of approval and adoption are controlled solely by the regulators, with a corresponding commitment to implement any approved guidelines within their region. When the ICH was initially established almost a quarter-century ago, it was comprised of six member parties who formed the ICH SC. This included three regulators: FDA, the EC, and the Japanese Ministry of Health, Labor and Welfare (MHLW)/Pharmaceutical and Medical Devices Agency (PMDA). The six parties also included corresponding pharmaceutical industry associations from these three regions: the Pharmaceutical Research and Manufacturers of America (PhRMA), the European Federation of Pharmaceutical Industries and Associations (EFPIA), and the Japanese Pharmaceutical Manufacturers Association (JPMA). In addition to these ICH SC members, there were several observer parties including Health Canada, the European Free Trade Association represented by Swissmedic, the WHO, and the International Federation of Pharmaceutical Manufacturers and Associations (IFPMA).1 Intensive work by these parties in the first decade or so following the establishment of ICH resulted in a number of initial seminal guidelines to harmonize requirements related to preclinical safety studies, clinical trial planning and oversight, and drug quality. Table 7.1 provides an overview or major topic areas currently addressed by ICH guidelines. One of the contributions in the early ICH work was the development of E6 Good Clinical Practices, a harmonized guideline for good clinical trial practices. The goals of ICH E6 include assurance of human subject protection and assurance of data quality. The guideline is intended to provide a standard guide so that drug developers know what they need to do both to comply with the regulations and document compliance, and the scope is limited to clinical research performed with regulatory intent. Developed with the particular aim of providing guidelines for the conduct of trials to explore the safety and effectiveness of investigational new drugs, E6 has provided critical guidance for both international regulators and clinical researchers. Investigational new drugs pose the greatest potential risks to study participants because of the limited safety and

TABLE 7.1

Sampling of Major Topic Areas Addressed by International Council on Harmonisation Guidelines

SAFETY • Carcinogenicity studies • Genotoxicity studies • Toxicokinetics and Pharmacokinetics • Toxicity testing • Reproductive toxicology

• Biotechnology products • Pharmacology studies • Immunotoxicology studies • Nonclinical evaluation for anticancer pharmaceuticals • Photosafety evaluation

EFFICACY • Clinical safety • Clinical study reports • Doseeresponse studies • Ethnic factors • Good clinical practice

• Clinical trials • Clinical evaluation by therapeutic cat • Clinical evaluation • Pharmacogenomics • Multiregional clinical trials

QUALITY • • • • •

Stability Analytical validation Impurities Pharmacopoeias Quality of biotechnology products • Specifications

• Good manufacturing practice • Pharmaceutical development • Quality risk management • Pharmaceutical quality system • Development and manufacture of drug substances

MULTIDISCIPLINARY • Medical Dictionary for Regulatory Activities terminology • Electronic standards • Nonclinical safety studies • Common Technical Document and Electronic Common Technical Document

• Data elements and standards for drug dictionaries • Gene therapy • Genotoxic impurities

effectiveness information available at the time of the study. The significant cost of obtaining study data, the prime source of evidence for assessment of drug safety and effectiveness, requires the sponsor of the research to have great confidence that the study will produce sufficient high-quality evidence that is acceptable to the regulators. E6 was developed in the mid-1990s as the international guideline to be followed when generating

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

90

7. INTERNATIONAL REGULATION OF DRUGS

clinical data intended to be submitted to regulatory authorities. Moreover, E6 aimed to provide enough specificity so as to minimize potential ambiguity and resulting inconsistency in the interpretation of the guideline across different global regions where approaches to health-care delivery and regulatory practice might be expected to vary. The specificity was thus intended to minimize the potential for E6 interpretation to be yet another source of variability across investigational sites in multiregional clinical trials. Over the past 20 years, E6 has played an essential role in enabling the continued growth and success of multiregional clinical trials of investigational new drugs,2 including critical guidance related to training, responsibilities, and expectations of investigators, sponsors, and institutional review boards. It has thereby supported the earlier submission of new drug applications (with data collected in conformance with the harmonized guidelines adopted by regulators in multiple regions), enabling earlier access to new medicines for patients who need them. E6 also has provided a valuable guide to clinical trial oversight for research conducted to inform regulatory decision-making at all stages of drug development. This includes clinical trials to investigate the safety and effectiveness of a drug for a new indication for marketing approval. In the United States, these data would be submitted to the FDA in an “efficacy supplement” to the original New Drug Application (NDA) or biologics licensing application (BLA). Other research may include postmarketing studies required of or agreed to by a sponsor that are conducted after regulatory approval of a drug for marketing. For example, FDA uses postmarketing study commitments and requirements to gather additional information about a product’s safety, efficacy, or optimal use. Another ICH harmonization accomplishment that is considered among the most significant is the development of a Common Technical Document (CTD) for regulatory submissions by industry sponsors.1,2 Prior to the development of the CTD, the regulatory submissions of marketing applications in different ICH regions each involved complex multiple submissions in differently organized and structured formats. Submissions even varied within regions from one application submission to the next and would even vary by company depending on the team who worked on assembling the application. This meant that reviewers in each region would effectively need to learn the structure of each new submitted application and hunt for the information that need to be available for a complete review. This general lack of international standards created a situation that was burdensome for industry to assemble dossiers to different standards or no existing standard at the regional level,

and equally cumbersome and difficult for regulators to access and navigate for review. The development and adoption of the harmonized CTD has revolutionized the submission of marketing applications by enabling sponsors to replace multiple divergent formats with a single technical dossier that can be submitted to all ICH regions and other regulatory authorizes who have adopted the standard, thus facilitating simultaneous submission for review and potential approval, and potentially earlier access for patients, in multiple world regions. Development of the original paper-based CTD standard also facilitated the development of a subsequent electronic standard (eCTD) that has further enabled regulators to do more automated checks on the completeness of applications and their readiness for review. The eCTD also has enabled the development of a suite of electronic review tools that have prompted better quality in submitted applications and enabled greater efficiency in regulatory review. Another area of major accomplishment for ICH concerns the extension and maintenance of the Medical Dictionary for Regulatory Activities (MedDRA).3 MedDRA is a highly specific standardized medical terminology originally begun by the UK’s Medicines and Healthcare Products Regulatory Agency (MHRA) and transferred to ICH for further development and support for broad international use. The MedDRA terminology is used for the marketing registration, documentation, and postmarketing safety monitoring of medical products. The medical products covered by the scope of MedDRA include pharmaceuticals, biologics, vaccines, and drugedevice combination products, and it is now being used by regulators, pharmaceutical companies, clinical research organizations, and health-care professionals. In addition to the original English version and Japanese translation, MedDRA has been translated and is maintained in languages including Chinese, Czech, Dutch, French, German, Hungarian, Italian, Portuguese, and Spanish. Each MedDRA term has an associated 8-digit numerical code, which remains the same irrespective of the language. While the initial focus of ICH effort was concentrated on development of harmonized standards to address the discrepancies between regulatory standards in the three ICH regions, it also became clear to the ICH parties that there was both considerable interest and value in having the ICH guidelines be considered for adoption beyond the three founding regions. This was true both because these non-ICH regions had growing pharmaceutical industry activities and the regulatory authorities wanted to gain efficiencies where possible by adopting existing ICH harmonized guidelines if they were deemed applicable and acceptable. This led to ICH creation of

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

BACKGROUND

the Global Cooperation Group (GCG)2 with the goal of promoting a better understanding of ICH and ICH guidelines. Through the GCG other non-ICH drug regulatory authorities (DRAs) and regional health initiatives (RHIs) were invited to attend the now-biannual ICH meetings to listen to ICH technical discussions at all levels from expert working group (EWG) who were actively engaged in the development or updating of ICH guidelines through SC meetings to oversee the guideline work of the experts. This forum helped to lay the groundwork for more recent reforms of the ICH governance and operations.

Recent Evolution and Reforms As noted earlier, since the establishment of ICH over 20 years ago, the pharmaceutical industry had evolved to be globally based. During this time, other non-ICH national and regional economies had grown rapidly and were emerging as global players with increasing pharmaceutical industry activities that include clinical development and manufacturing as well as marketing. Given these important changes, the ICH parties recognized the need to modernize the ICH, and identified several goals for a “reformed” ICH. These included the following: (1) establishing one major and preferred venue to focus global drug regulatory harmonization work that would be accessible to all key drug regulatory stakeholders; (2) creating a venue that would allow all these stakeholders the opportunity for input to the drug harmonization work; and (3) maintaining the efficiency and effective management of harmonization operations that had been key to ICH success and effectiveness throughout its past. ICH parties recognized the need for a single venue for global harmonization work noting that greater synergy that would be achieved by having drug regulatory authorities from emerging economies join the existing and well-established guideline process offered by ICH rather than seek to engage in bilateral national harmonization efforts or multilateral regional harmonization efforts. Considering the criticality of international harmonization efforts, there was also a strong desire to maximize the level of public health protection achieved with the limited resources typically available to drug regulators. There was a related concern that engaging in a variety of disconnected bilateral and regional harmonization efforts would ultimately lead to a fragmentary and suboptimal allocation of regulatory resources and less effective global oversight of regulated entities. The need to provide all regulatory stakeholders, particularly non-ICH drug regulators, an opportunity for greater harmonization input was based on

91

recognition that there was, at the time, no formal role for these non-ICH parties. In addition to the regular opportunity to adopt already-existing ICH guidelines, it was considered critically important to offer more formal and regular opportunities for input and engagement in the guideline development process and the new topics selected for harmonization. Finally, it was felt that new provisions to expand participation would need to be integrated with approaches to effectively manage what would become a larger and more diverse stakeholder engagement in the technical harmonization process. In addition to these overarching goals there were other objectives intended to bring ICH into the modern era of organizations involved in the public sphere. In addition to greater stakeholder inclusivity, this meant transitioning from a somewhat informal operation funded largely by industry contributions to a more formal and transparent organization and operations including establishing ICH as a legal entity, creating a more distributed and equitable approach to financing operations by member parties, and establishing more routine public sharing of information and outreach about ongoing ICH harmonization work and work products. Important milestones related to these goals and objectives have been reached over the past year. These include the establishment of the International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH) as a nonprofit association under Swiss law in October 2015, providing the specification of a new expanded structure for membership and governance.4

Membership in the New International Council on Harmonisation Under the new structure, ICH has five categories of membership: Founding Regulatory Members, Founding Industry Members, Standing Regulatory Members, Regulatory Members, and Industry Members. With the establishment of the new association, the parties that had been members of the earlier version of ICH became members of the new organization. Thus the Founding Regulatory Members include the FDA, EC, and MHLW/ PMDA. The Founding Industry Members include PhRMA, EFPIA, and JPMA. The Standing Regulatory Members include Health Canada and Swissmedic because these two drug regulatory authorities also served as members of the SC before the establishment of the new ICH.4 The new structure of ICH also provides for three categories of observership: Standing Observers, Observers, and Ad-Hoc Observers. The WHO and IFPMA, parties who had been observers in the earlier ICH, became

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

92

7. INTERNATIONAL REGULATION OF DRUGS

Standing Observers. Other parties who had participated in the GCG as DRAs and RHIs, or who had participated in the past as observers or interested industry parties, were invited to immediately join the new ICH as Observers and to consider becoming Members. To be eligible to become a Regulatory Member, -regulators need to have participated in at least three of the past four ICH biannual meetings, have appointed experts to at least two EWGs of ICH, and have implemented a minimum of the following three ICH guidelines: Q1: Stability Testing of New Drug Substances and Product; Q7: Good Manufacturing Practice Guide for Active Pharmaceutical Ingredients, and E6: Good Clinical Practice Guideline. Regulatory Members have the right to attend meetings of the ICH Assembly, vote in the Assembly, elect Members to the ICH Management Committee and appoint experts to ICH Working Groups. The membership criteria for industry parties includes having the status of a global pharmaceutical industry association representing members from several countries in at least three continents, who are regulated or affected by ICH guidelines, have participated in at least three of the past four ICH biannual meetings, and have appointed experts to at least two EWGs of ICH. Industry Members also have the right to attend ICH Assembly meetings, vote on Assembly decisions (those not considered strictly the purview of regulatory Members), elect Members to the ICH Management Committee and appoint experts to ICH Working Groups.4 Eligibility criteria for Observers are rather minimal and designation of Observers is based on determination of their contribution or benefit to ICH. To be considered for formal Observer status, a party must be a government authority or an RHI representing government authorities responsible for regulating pharmaceuticals for human use, an international pharmaceutical industry organization or other international organization with a direct interest in pharmaceuticals. Observers have the right to attend Assembly meetings but without voting rights.4

membership, annual member fees and other finances, decisions related to the approval of new topics for ICH guidelines, adoption, amendment or withdrawal of topics, and the Association annual work plan and strategic plan of work. The ICH Assembly has jurisdiction over similar issues related to MedDRA Management Committee and the approval of cooperation between ICH and other organizations.4 The ICH Management Committee is comprised of representatives of the Founding Regulatory and Founding Industry Members, Standing Members, and Elected Members and is required to meet at least in conjunction with the Assembly meetings. Management Committee responsibilities are primarily administrative and financial and include, but are not limited to, planning logistics; preparing and convening Assembly meetings; preparing the ICH annual and multiyear strategic plan; exercising oversight over the EWG process and operations to ensure quality, efficiency, and timeliness of guideline development; making recommendations to the Assembly related to new topics; new membership or observership applications; changes to membership fees; and other financial matters.4 The new ICH Association actively participates in MedDRA through its Members, and the MedDRA Management Committee has the role of managing, supporting, and facilitating the maintenance, development, and dissemination of MedDRA. The MedDRA Management Committee is comprised of representatives from the Founding Regulatory, Founding Industry, and Standing Members of ICH, and representatives of MHRA, with the WHO serving as an observer. The MedDRA Management Committee is responsible for ensuring MedDRA’s integrity, viability, and sustainability as a harmonized standard. This extends to Committee oversight of contracted third parties serving as a maintenance and support services organization, oversight of the annual budget for MedDRA, and determination of subscription fees as part of that budget oversight.4

Organization of the New International Council on Harmonisation

Financing the New International Council on Harmonisation

The main deliberative bodies of the new ICH organization include the Assembly, the Management Committee, and the MedDRA Management Committee; and these bodies are provided administrative and project management support by the ICH Secretariat. The Assembly is composed of all categories of Members of the ICH association, and it is expected to hold at least one ordinary meeting each year. The scope of its governance includes, but is not limited to, decisions related to the ICH Articles of Association and Rules of Procedure, decisions about Membership status of various parties, decisions related to Management Committee

With the establishment of the new nonprofit association under Swiss law, ICH also has transitioned to a more formal approach to funding operations based on annual membership fees to cover costs associated with the ICH Secretariat and other regular operations and to finance the biannual meetings at which the Assembly and Management Committee meet and as importantly, EWGs meet face-to-face to engage in ICH technical guideline work. These in-person interactions typically span several days and really facilitate progress in technical harmonization discussions. The harmonization work of ICH is conducted through a highly

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

OVERVIEW OF THE INTERNATIONAL COUNCIL ON HARMONISATION TECHNICAL HARMONIZATION PROCESS

structured process that has proven to be quite effective and is further described in the section that follows.

OVERVIEW OF THE INTERNATIONAL COUNCIL ON HARMONISATION TECHNICAL HARMONIZATION PROCESS Nomination and Selection of Topics for Harmonization The ICH process begins with the proposal to undertake a new topic or update an existing ICH guideline related to drug Safety, Efficacy, Quality or a relevant Multidisciplinary issue. The proposal typically will include a brief statement of the identified problem caused by a lack of drug regulatory harmonization and would describe the main technical and scientific issues to be addressed in the proposed ICH technical harmonization work and the expected outcomes of that work. In view of the significant resource commitment required by all ICH parties to undertake regulatory harmonization work, new topic proposals also must provide a strong case for why a particular area of identified disharmony is so important for international harmonization. This might include evidence to suggest that the new topic proposal could potentially conserve regulatory agency or industry resources in the future or might potentially improve the timing of access of new drugs to patients. New topic proposals are also expected to address whether the identified technical issues for harmonization are feasible to address within the limits of current national laws and regulations in all of the ICH regions, whether the level of effort and length of time likely to be required of experts for the guideline work is feasible for at least the minimum set of required ICH Members, and whether the proposed topic might potentially compete for ICH resourcing within or across the topic categories (Q, S, E, M). Finally, new topic proposals are also expected to discuss the timing for when it would be anticipated that the benefits of the completed guideline would be realized and how the proposed topic relates to, and potentially complements or conflicts with, other existing guidelines.5 A topic proposal would be submitted to the ICH Management Committee and can be submitted by any ICH Member or Observer. The Management Committee will review and make an initial assessment of the mission relevance, urgency, and feasibility of all proposed topics, provide this assessment, offer recommendations, and seek endorsement and topic prioritization from the Assembly during a biannual meeting session.6 If a new topic proposal is endorsed by the Assembly, an informal Working Group will be established to develop a Concept Paper to further flesh out the

93

harmonization work that would be undertaken based on what was outlined in the original new topic proposal. ICH Members can nominate representatives to informal Working Groups, and all nominees are expected to have expertise relevant to the topic subject matter. Typically, unless otherwise specified by the Assembly, the official membership of an informal Working Group will be limited to two representatives per ICH Member, including one expert designated as the Topic Leader for that Member and the other designated as Deputy Topic Leader. The Topic Leaders and Deputy Topic Leaders are expected to participate in the informal Working Group discussions and to be the point of contact for any consultation carried out among experts between meetings. While all ICH Members may appoint expert representatives, every Founding Regulatory Member is required to nominate at least one expert to each informal Working Group. In addition, the presence of at least one expert from each Founding Regulatory Member and if nominated, one expert from each Founding Industry Member and Standing Regulatory Member nominated to the informal Working Group, is required to constitute a quorum for Group meetings. The informal Working Group functions to both develop and provide any further refinement needed to finalize the Concept Paper, and submit this to Management Committee for endorsement.5 Following the endorsement of a guideline Concept Paper, an EWG will be established. While all Founding Regulatory Members are required to appoint experts to all EWGs, the Founding Industry Members, Standing Regulatory Members, and other ICH Members are invited and encouraged to appoint technical experts to all EWGs. Unless otherwise specified by the Assembly, the Membership of an EWG will typically be limited to two representatives per ICH Member per Working Group (this limitation applies to both Regulatory Members and Industry Members) with one expert designated as Topic Leader and the other as Deputy Topic Leader. In addition, ICH Observers who would like to participate in the EWG may submit a request to appoint an expert observer to a specified Working Group.6 The EWG is responsible for developing a detailed Work Plan prior to initiation of the guideline work. The Work Plan will include anticipated milestones, a timeline for the completion of activities, a summary of any issues, and a justification for a future face-to-face meeting if that is requested by the EWG. The Work Plan for each EWG is posted on the ICH website. The work of the EWG is led by the appointed Regulatory Chair and Rapporteur. The Regulatory Members of the ICH Management Committee officially designate a Regulatory Chair from the Regulatory Members. The Rapporteur for the EWG, however, will be designated by the full Assembly and selected from among the Topic

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

94

7. INTERNATIONAL REGULATION OF DRUGS

Leaders designated by various ICH Members when the new topic was formally endorsed. In general, the Regulatory Chair and the Rapporteur will be from different regions. The role of the Rapporteur is to serve as the scientific cochair to the Regulatory Chair, and in that role he/she is expected to facilitate and manage scientific and technical activities of the EWG. This includes reconciling scientific differences of opinion to produce an ICH document with the scientific and technical content that is drafted in accordance with Assembly expectations and decisions. The Rapporteur typically works in close collaboration with the Regulatory Chair, whose role and major responsibility is to ensure that the initially proposed work plan time frames are met and work of the EWG remains within the scope of its Assembly-approved mandate.5,6

International Council on Harmonisation Five-Step Harmonization Procedure The Formal ICH Procedure is a five-step process that is used to develop the ICH harmonized guidelines for implementation within each Member’s region. Table 7.2 provides an overview of the five-step procedure, which is used for new guidelines and is initiated following the endorsement of a Concept Paper by the Assembly. In Step 1 of the Formal ICH Procedure, the EWG members work together to prepare a consensus draft of the technical document based on the objectives set out in the Concept Paper. The Rapporteur will typically prepare an initial draft of the technical document in consultation with the experts appointed to the EWG. The initial draft and successive revisions are discussed among the EWG and circulated with comments among the members of the group. Each ICH Member with experts appointed to the EWG is responsible for providing any comments within the timeframe allotted (typically

proposed by the Rapporteur and agreed to by the EWG members).5 The EWG will conduct some of this work in the week-long biannual face-to-face meetings of the ICH but typically performs much of the work in between the biannual meetings working via email and regular teleconference calls. When the EWG reaches consensus on the technical document, the consensus text approved by the ICH Members’ experts in the EWG is “signedoff” by those experts, making it the Step 1 Technical Document.5 Once the EWG signs off on the technical document, the Step 1 Technical Document with expert signatures is submitted to the ICH Assembly to request endorsement under Step 2a of the ICH process. In Step 2a, the Management Committee Regulatory Members and Industry Members will provide a recommendation to the Assembly on the decision to endorse the final Technical Document, based on the report of the EWG that there is sufficient scientific consensus on the technical issues for the Technical Document and recommendation to proceed to the next stage of regulatory consultation. The consensus text is endorsed by the Assembly as a Step 2a Final Technical Document either during a face-to-face meeting or through an electronic approval procedure that is organized by the ICH Secretariat.5 Recognizing that the ICH Regulatory Members, unlike Industry Members, are uniquely responsible for ultimate adoption, implementation, and potentially enforcement of new ICH guidelines as regulatory policy within their respective regions; Step 2b is a “Regulators only” step in which the ICH Regulatory Members will review the Step 2a Final Technical Document and take any actions, which might include revisions that they deem necessary to develop the draft “Guideline.” The consensus text of the Draft Guideline is then endorsed by the Regulatory Members of the ICH Assembly as the Step 2b Draft Guideline, and this allows the process to progress to Step 3 Regional Regulatory Consultation.5

TABLE 7.2 Overview of ICH 5-Step Procedure for Harmonized Regulatory Guidelines Before Step 1 Concept paper development

U

Work plan development

U

Consensus on draft of technical document All-party endorsement of final technical document Regulatory endorsement of draft guideline

Step 1

Step 2a

Step 2b

Step 3

Step 4

Step 5

U U U U

Regional regulatory consultation

U

Regulatory adoption

U

Regional implementation

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

INTERNATIONAL COUNCIL ON HARMONISATION GUIDELINES MOST RELEVANT TO CLINICAL RESEARCH

Step 3 of the Formal ICH Procedure begins with the public consultation process conducted by each of the ICH Regulatory Members in their respective regions. Step 3 continues with the collection and analysis of the public comments received across all regions participating in this process. After obtaining all regulatory consultation results, the EWG that organized the earlier Step 1e2 discussions for consensus building will reconvene. Although the reconvened group will include both Industry and Regulatory expert representatives, the leadership of the group may need to shift: if the Rapporteur was designated from an Industry Member until Step 2b, then a new Rapporteur will be appointed from a Regulatory Member typically from the same region as the previous industry Rapporteur. Step 3 concludes with completion and acceptance of any revisions that need to be made to the Step 2b Draft Guideline in response to public comments. The draft document generated as a result of the Step 3 phase is called Step 3 Experts Draft Guideline and is signed by the EWG experts of the ICH Regulatory Members and then submitted to the Assembly with a request for adoption.5 Adoption of the new Guideline occurs in Step 4. Adoption is based on a recommendation by the ICH Management Committee and the consensus of the ICH Assembly Regulatory Members affirming that the new Guideline is recommended for adoption by the Regulatory Members of the ICH regions.6 Following adoption the harmonized Guideline moves to Step 5, the final step of the process, and is implemented by each of the Regulatory Members in their respective regions. The harmonized Guideline is implemented according to the same national and regional procedures that apply to other regional regulatory Guidelines and requirements. In the United States, for example, ICH guidelines are treated as regulatory guidance to industry and made publicly available through a Federal Register Notice of Availability.

INTERNATIONAL COUNCIL ON HARMONISATION GUIDELINES MOST RELEVANT TO CLINICAL RESEARCH The Formal ICH Procedure has been used to develop E guidelines addressing Efficacy (E) topics and these are likely to have the greatest relevance to the planning and conduct of clinical research. Many of the currently available E guidelines were originally drafted in the first decade of ICH operations, under the earlier governance structure described in the Background section of this chapter. The E guideline numbering mainly reflects the chronological sequence of the development of these guidelines and is not intended to convey a particular priority or other dependency among the guidelines. In

95

addition, more than one E guideline will be relevant to the planning of a clinical study, particularly in the case of traditional interventional clinical trials performed to generate evidence to support drug regulatory review. Table 7.3 is provided to illustrate how multiple E guidelines may provide information and guidance relevant to a given subtopic. In this example, a set of critical to quality factors for clinical studies, based on a set of factors identified by the Clinical Trials Transformation Initiative,7 have been used to help develop the illustration. This section provides an overview of the content of this set of E guidelines. E1: The Extent of Population Exposure to Assess Clinical Safety for Drugs Intended for Long-Term Treatment of NonLife-Threatening Conditions. The goal of this guideline is to present an accepted set of principles for the safety evaluation of drugs intended for long-term treatment for non-life-threatening diseases, recognizing that safety evaluation during clinical drug development is expected to characterize and quantify the safety profile of a drug over a period of time consistent with the intended longterm use of the drug.3 E2A-E2F: Safety Data Management includes a set of six guidelines addressing different aspects of safety data management during clinical development and postapproval. The E2A guideline addresses clinical safety data management in terms of definitions and standards for expedited reporting. The E2B(R2) guideline focuses on data elements for transmission of individual case safety reports. The E2C(R2) guideline is concerned with clinical safety data management related to periodic safety update reports for marketed drugs (i.e., periodic benefiterisk evaluation reports), and E2D is focused on definitions and standards for expedited reporting of postapproval safety data. The E2E guideline addresses pharmacovigilance planning, and the E2F guideline is concerned with the development of safety update reports.3 E3: Structure and Content of Clinical Study Reports describes a single “core” clinical study report that can serve as an integrated complete report that would be acceptable to all regulatory authorities in ICH regions for any therapeutic, prophylactic, or diagnostic agent, including clinical and statistical presentations and analyses. Topics covered in this guideline on the structure and content of clinical study reports include study synopsis, ethics, the investigators and administrative structure, investigational plan, study patients, efficacy evaluation, safety evaluation, overall study conclusions, as well as other sections, including guidance throughout on the formats for presentation of the study data. The E4: DoseeResponse Information to Support Drug Registration guideline provides background on the purpose and use of doseeresponse information in clinical drug development and how this data should be

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

96 TABLE 7.3

7. INTERNATIONAL REGULATION OF DRUGS

International Council on Harmonisation Efficacy Guidelines

Critical to Quality Factors

E1

E2AeE2F

E3

E4

E5

E6

E7

E8

E9

E10

U

U

U

U

U

U

U

U

U

U

U

E11

E12

E14

E15

E16

E17

U

U

U

U

U

U

U

U

U

E18

PROTOCOL DESIGN Eligibility criteria (inclusion/ exclusion) U

Randomization Masking Types of controls

U

Data quantity

U

U

U U

U

Endpoints

U U

Procedures supporting study endpoints and data integrity

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

Investigational product handling and administration

U

FEASIBILITY U

Study and site feasibility

U U

Accrual

U

U

PATIENT SAFETY IRB consent

U

Informed consent

U

U

U

U

U

U

U

Withdrawal criteria and trial participant retention Signal detection and safety reporting

U (B)

U

Data monitoring committee stopping rules

U U U

U

U

U

U

U

STUDY CONDUCT Training (including investigator training)

U

Responsibilities among sponsor, investigator, and IRB

U

Data recording and reporting

U (B,C,F)

U

Data monitoring and management

U (A,B,D)

U U

Statistical analysis

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

U

STUDY REPORTING Dissemination of study results

U (D,F)

U

THIRD-PARTY ENGAGEMENT Delegation of sponsor responsibilities

U

Collaborations

U

IRB, institutional review boards.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

U

INTERNATIONAL COUNCIL ON HARMONISATION GUIDELINES MOST RELEVANT TO CLINICAL RESEARCH

obtained as an integral part of drug development. The guideline provides an overview of study designs for assessing doseeresponse, and advice on identification of a starting dose, titration steps, and other issues in dose-ranging or concentration response studies.3 E5(R1): Ethnic Factors in the Acceptability of Foreign Clinical Data provides a framework for evaluating the impact of ethnic factors on drug efficacy and safety and dose regimen, and guidance on regulatory and development strategies to allow evaluation of ethnic factors while minimizing duplication of studies across regions. This includes the use of bridging studies to extrapolate from the studied populations to a new global region to support acceptance of the data as a basis for drug registration in the new region.3 The E6(R1): Good Clinical Practice Consolidated Guideline is primarily focused on the assurance of human subject protection and assurance of data quality in clinical trials, including training that is needed for the investigators and others, and processes that should be followed both in study conduct and in documentation. Focused primarily on clinical research performed with regulatory intent, it provides a standard guide so that clinical researchers, including drug developers and clinical research staff, know what they need to do both to comply with the regulations and document compliance.3 E7: Studies in Support of Special Populations: Geriatrics. This guideline is primarily concerned with development of new molecular entities for the treatment of disease associated with aging, or new formulations or combinations of established drugs to treat conditions common among the elderly. The guideline addresses extension of the age range of elderly patients that would be desirable to include in studies and inclusion of sufficient numbers of elderly patients in the Phase 3 database. It also addresses the need for attention to pharmacokinetic differences between nonelderly and elderly patients, and circumstances when drugedrug interactions should be studied.3 The E8: General Considerations for Clinical Trials guideline offers a general overview of clinical trial topics including other ICH Guidelines concerning clinical trials. It contains, for example, a table classifying clinical studies according to objective and an annex crossreferencing other relevant ICH guidelines. Although the table includes examples of large simple trials, comparative effectiveness studies, and pharmacoeconomic studies, the guidance is primarily focused on studies intended to support drug regulatory submissions.3 The E9: Statistical Principles for Clinical Trials guideline, developed to harmonize the principles of statistical methodology applied to clinical trials to support marketing applications in all ICH regions, addresses key issues including considerations for overall clinical development such as trial context and scope, various

97

trial design considerations, trial conduct considerations, data analysis issues, the evaluation of safety and tolerability, and reporting. In addition, E9 is currently being revised to include considerations related to choosing appropriate estimands and defining sensitivity analyses in clinical trials.3 E10: Choice of Control Group and Related Issues in Clinical Trials. This guideline describes the general principles involved in choosing a control group for clinical trials intended to demonstrate efficacy and discusses related issues concerning trial design and conduct. Without explicitly addressing the regulatory requirements in various ICH regions, the guideline describes the purpose of a control group, different types of controls, what trials using different designs can demonstrate, and critical design and interpretation issues to be considered.3 The E11: Clinical Investigation of Medicinal Products in the Pediatric Population guideline provides an overview of critical issues in pediatric drug development. This guideline addresses topics including considerations when initiating a pediatric drug program, timing for the initiation of pediatric studies during drug development, the types of pharmacokinetic, pharmacokinetic/ pharmacodynamics, efficacy and safety studies to conduct, age categorizations for pediatric patients, and ethical considerations in pediatric clinical studies. E11 is currently being revised to update information for these topics and provide more discussion in several selected areas including formulation challenges in pediatric drug development and appropriate extrapolation of data from adult populations to pediatric populations and pediatric subgroups to other pediatric subgroups.3 The E12: Principles for Clinical Evaluation of New Antihypertensive Drugs guideline describes core principles for evaluation of hypertensive drugs accepted in all ICH regions that can be used in conjunction with any region-specific guidelines that may address other region-specific regulatory requirements. The E14: The Clinical Evaluation of QT/QTc Interval Prolongation and Proarrhythmic Potential for Non-Antiarrhythmic Drugs guideline provides recommendations concerning the design, conduct, analysis, and interpretation of clinical studies to assess the potential of a drug to delay cardiac repolarization. Measured in terms of the prolongation of the QT interval on the surface electrocardiogram, a delay in cardiac repolarization can indicate a potential increase in the risk of cardiac arrhythmias and is thus an important consideration in assessing drug safety.3 E15: Definitions for Genomic Biomarkers, Pharmacogenomics, Pharmacogenetics, Genomic Data and Sample Coding Categories. To facilitate the integration of the discipline of pharmacogenomics and pharmacogenetics into global drug development and regulatory review, the E15 guideline provides harmonized definitions for

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

98

7. INTERNATIONAL REGULATION OF DRUGS

key terms including genomic biomarkers, pharmacogenomics, pharmacogenetics, genomic data, and sample coding categories.3 E16: Genomic Biomarkers Related to Drug Response: Context, Structure, and Format of Qualification Submissions. Recognizing that use of biomarkers in drug development has the potential to help guide dose selection and thus enhance the benefiterisk profile of a new drug, the E16 guideline describes recommendations related to the context, structure, and format of regulatory submissions for qualification (assessment that a biomarker can be relied on) of genomic biomarkers as defined in E15.3 In addition to the aforementioned E guidelines, a Step 2 consensus draft version of E17: General on Principles for Planning and Design of Multiregional Clinical Trials (MRCTs) is also available and currently undergoing Step 3 regional regulatory consultation. The purpose of this guideline is to describe general principles for planning and design of MRCTs with the goal of increasing the acceptability of MRCTs in global regulatory submissions. This guideline addresses strategic issues as well as clinical trial design and protocol-related issues. The latter, for example, includes preconsideration of regional variability and its potential impact on efficacy and safety, subject selection, dose selection, estimation of overall sample size and allocation to regions, as well as other issues.3 A Step 2 consensus draft version of the E18: Guideline on Genomic Sampling and Management of Genomic Data guideline is also currently available. This draft guideline provides harmonized principles for genomic sampling and management of genomic data in clinical studies, to facilitate the implementation of such studies by enabling a common understanding of critical parameters for unbiased collection, storage and use of genomic samples and data. The guideline is also intended to increase awareness and provide guidance regarding subject data privacy, data protection, informed consent, and transparency, addressing the use of genomic samples and data regardless of the timing of analysis, considering both prespecified and nonprespecified use.3

FUTURE WORK IN REGULATORY HARMONIZATION It is a dynamic and exciting time for the international harmonization of drug regulatory standards. With the newly reformed ICH serving as the key central venue for regulatory harmonization work, the future direction is likely to be shaped by the needs of a diverse and growing global body of drug regulators, industry organizations, and other stakeholders, and the emerging areas of priority and consensus. One might expect a

diverse yet balanced portfolio of harmonization work that encompasses both new guideline work and major continuing renovation of earlier foundational guidelines originally developed decades ago. The new topics may address new areas of currently unmet need. The renovations and revisions may be undertaken to incorporate new regulatory science, new methodology, modern perspectives on patient engagement, or other important and recent advances in existing topic areas. This work is likely to span the set of Q, S, E, and M topics. It is also likely that future guideline work will address key issues for generic drug registration as well as innovator drug regulatory submissions. The resulting benefits of better quality development programs, more rigorous and complete regulatory submissions, and common scientific standards applied around the globe will increasingly be felt by patients around the globe. Expanding global adoption of common regulatory guidelines should improve patient access to safer, enable more efficient and less burdensome clinical trials, and expand availability of high-quality medicines on the market.

References 1. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. The value and benefits of ICH to industry. 2000. http://www.ich.org/fileadmin/ Public_Web_Site/News_room/C_Publications/The_Value___ Benefits_of_ICH_to_Industry__January_2000.pdf. 2. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. The value and benefits of ICH to drug regulatory authoritiesdadvancing harmonisation for better health. 2010. http://www.ich.org/fileadmin/Public_ Web_Site/News_room/C_Publications/ICH_20_anniversary_ Value_Benefits_of_ICH_for_Regulators.pdf. 3. International Council on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Please note that more information about: MedDRA; the latest versions of the Articles of Association, Rules of Procedure and SOPs of the ICH Working Groups, and the full text of all of the ICH guidelines can be found on the ICH website: www.ich.org. 4. International Council on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Articles of association. October 2015. 5. International Council on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Standard operating procedures of the ICH working groups. September 2016. 6. International Council on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Rules of procedure of the assembly. June 2016. 7. The Clinical Trials Transformation Initiative (CTTI). CTTI quality by design projectdcritical to quality (CTQ) factors principles document. 2015. https://www.ctti-clinicaltrials.org/files/principles_document _finaldraft_19may15_1.pdf.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

8 Clinical Research in International Settings: Opportunities, Challenges, and Recommendations Christopher O. Olopade, Michelle Tagle, Olufunmilayo I. Olopade The University of Chicago, Chicago, IL, United States

O U T L I N E Introduction

Develop and Enhance Local Institutional Review Board Capacity Develop Office for Sponsored Research/Office of Clinical Research Prepare Data Safety and Monitoring Plan for Adverse Events Provide Ancillary Care Use Technology for Effective Communication Have Long-Term Plans Integrate With Existing Infrastructure

99

Challenges Inadequate Human Resources Deficient Research Infrastructures Subpar Health-Care Systems Information Gaps Political Instability, Civil Disorders, and Natural Disasters Economic and Seasonal Migration Physical Barriers Study Participant Characteristics Ethical Issues

100 100 100 101 101

Recommendations Understand the Local Setting Train, Mentor, and Closely Supervise

103 103 104

101 102 102 102 102

INTRODUCTION

105 105 105 105 105 105

Conclusion

106

Summary Questions

106

References

107

systems, and widespread infections such as HIV, H1N1, Ebola, and more recently, Zika. In response, initiatives such as the United Nations Millennium Development Goals and Sustainable Development Goals have dramatically increased awareness of the state of global health and detailed a path toward health equity. In light of the global burden of disease and its disproportionate impact on resource-limited countries, there is tremendous opportunity to conduct standardsetting clinical research that could inform policy at international sites.

We live in a world of increasing interconnectedness and interdependence where people, goods, beliefs, ideas, and values are continually transcending national boundaries. While globalization may pose threats that demand our concerted efforts, it also promises opportunities for unprecedented growth and change. In the public health sphere, transnational approaches are being warranted by global concerns, including widening health disparities, poor health-care delivery

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00008-3

104

99

Copyright © 2018. Published by Elsevier Inc.

100

8. CLINICAL RESEARCH IN INTERNATIONAL SETTINGS

In taking a historical perspective on global health, we have learned that the health issues of low- and middleincome countries (LMICs) extend beyond infectious diseases following appreciable successes in global economic development, near eradication of polio, and improved control of HIV and malaria.10,68 The field of global health, which in the past had been almost synonymous with infectious disease, has since evolved to include chronic noncommunicable diseases (NCDs). Despite increases in life expectancy and improvements in living standards in LMICs, health disparities continue to widen and the prevalence of NCDs, such as hypertension, diabetes, and cancer, continues to rise.39 Since 2000, the number of NCD deaths has grown in every region of the world, and currently, NCDs are responsible for more deaths than all other causes added together. The World Health Organization (WHO) predicts that from the 38 million NCD deaths in 2012, the number will continue to grow to 52 million by 2030.75 To tackle the enormous burden of communicable and NCDs, there is a need for substantial global health research investment to address health problems affecting 90% of the world population.41 We must develop evidence-based interventions, including better characterization of disease patterns and their associated socioepidemiological factors. More efforts are needed to understand the pathogenesis of disease and to discover novel therapies targeted at neglected diseases.35 There is also an urgent need to improve health-care access, including access to current therapies against communicable diseases (e.g., HIV, tuberculosis) and emerging infectious diseases that pose major threats to lives in both resource-limited and resource-rich countries.30 The discovery and accessibility of these interventions will depend on the conduct of clinical research around the world, especially in LMICs where research activity is low and where evidence-based solutions may produce the largest impact on high early mortality rates.32 Conducting clinical research at international sites, especially in LMICs, are fraught with numerous challenges and therefore requires innovative approaches that place a high premium on understanding the local context and a fervent commitment to follow research regulations and ethical guidelines. In the proceeding sections, we briefly discuss some of the most common challenges experienced by the global research enterprise and offer recommendations that strengthen the prospects of novel solutions to our global health problems.

CHALLENGES Inadequate Human Resources One of the most significant challenges to conducting research in LMIC centers around human resources is

not only personnel are inadequately trained, but there is also a lack of a critical mass of investigators. The few who are highly competent are consequently in high demand and often recruited to work in highincome countries (HICs) where career advancement opportunities are greater.13 Among various factors that have contributed to the systematic decline of higher education and the clinical research workforce shortage in most resource-poor countries, some major ones include inadequate leadership and investment in academic infrastructures, a general lack of awareness of scientific benefits by administrators and the public, poor faculty development, and the “brain drain” of academic staff. This human capital flight has affected African countries, leaving an acute shortage of qualified academic staff within the higher education system to train the next generation of academic leaders. Every year since the 1990s, Africa has lost 20,000 professionals to the Westda phenomenon that the United Nations has identified to be one of the greatest challenges to the continent’s development.27 The outflow of health workers also has significantly impacted countries in the South East Asia Region, including Bangladesh, Bhutan, Myanmar, and Timor-Leste, leaving them below the global benchmark for health workforce population ratio.65 Further exacerbating the issues surrounding inadequate personnel, HIC-sponsored studies often lead to power imbalances where much of the research is primarily funded, conducted, and published by HIC investigators, and thereby leaving LMIC colleagues with little recognition of their work or benefits from the publications.13 Underscoring the discrepancy in human resources between countries, the World Development Indicators 2016da publication by World Bank Groupdreported that the per capita number of physicians (per 1000) was 4.9 in both Belgium and Spain whereas it was 0.20 in Kenya and 0.10 in Mauritania. This correlates with the health expenditures per capita of $4813, $2644, $70, and $47, respectively; and this huge disparity in health-care spending is, in turn, reflected in data on life expectancy at birthd81, 83, 61, and 63 years, respectively.73 Also, whereas spending has increased in HICs, it remains low in LMICs, especially in sub-Saharan Africa (SSA). This limited health-related spending is further worsened by corruption and poor management, which diverts even more funds away from the health sector.61

Deficient Research Infrastructures While scientific research has been considered a pillar for world development, there lies a disparity in the global distribution of the resources critical to building and sustaining scientific research capacity. Prime

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

CHALLENGES

examples of this phenomenon can be found in constrained cancer research programs in Africa2 and significant disparities in genomics research capacity in SSA, with South Africa exhibiting the greatest genomics research output due to greater investments into their genomics sector.1 In a majority of LMICs, economic constraints render research a luxury.24 Laboratories are underequipped due to insufficient funds, and the institutions that have purchasing power soon learn of the challenges associated with the installation, servicing, and maintenance of equipment.51 Supplies ranging from costly equipment replacement parts to inexpensive paper towels are a challenge to procure and often require steep overseas shipping fees. Damaged or malfunctioning equipment litter these resource-poor facilities due to the unavailability of maintenance services, thereby rendering the equipment useless and resulting in wasted time and funds.

Subpar Health-Care Systems Compared with LMICs, health-care delivery to the general populace is seamless in most HICs, but this has been possible due to learned experiences, capital investments, and opportunities to refine the processes over time. For example, the National Health Service (NHS) in the United Kingdom and the Servizio Sanitario Nazionale in Italy provide universal health coverage through public hospitals that is funded through tax policies. The systems in the Netherlands and Switzerland are based on compulsory insurance with in-built risk equalization to prevent setting premiums based on health status. Similarly, health care in France is offered by both private and public hospitals with a social security system that refunds most of the costs. The foundation of these systems is based on the revenue brought in through higher taxes for high-income earners, making quality health care physically and financially accessible to virtually the entire population. In contrast, most LMICs have meagerly funded and poorly structured health-care systems.39,40 Different levels of governments often fund the majority of health-care services with private and charitable organizations playing varying roles. Accessibility can be even further impaired for women and children as a result of religious, cultural, and economic factors.34 In addition, the coordination of service delivery through primary or secondary tiers is often poor, and the referral system is not well developed, as most people do not have specific primary care providers.57 These factors together impact the physical and financial accessibility to basic health care by the population. Moreover, these fragmented health systems result in duplicated efforts.7,18 Foreign donors who often run

101

nonoverlapping, disease-specific, vertical programs without sharing resources and experiences fund a majority of the functional programs. Both human and financial resources are wasted due to the lack of consolidation of clinical services and studies, and instead, separate but identical programs are being implemented.7,18,53

Information Gaps Within the scientific community, peer-reviewed literature is the official avenue through which communication takes place, and yet, researchers in LMICs have long experienced problems in accessing peer-reviewed content due to the high costs of journal subscriptions. While major biomedical publishers have invested in initiatives that offer such resources at little or no cost, other barriers remain, including limited computer access, scant Internet connections, unreliable electricity, and limited knowledge on how to use online resources.8 Furthermore, basic epidemiological data are not available to researchers, making it difficult to know, let alone prioritize, health needs in LMICs.55,77 Birth, death, and disease-specific registries are not maintained, as government policies regarding these data, even when they do exist, are not enforced due to inadequate facilities and trained staff. Lack of such vital data limits robust clinical research and critically impairs the capacity to project the kinds of interventions needed at specified locations.21,66

Political Instability, Civil Disorders, and Natural Disasters Many LMICs live in a constant state of flux, experiencing continuous change to strata and forms of leadership, which in turn affects field research.56 At the local level, changes in culture-based governance or political structure have a significant impact on the conduct of clinical research beyond that of granting permission and providing support. As most of these communities are tight-knit, the opinions of both spiritual and temporal authorities markedly influence the general population. Changes in such leadership or political mechanisms will invariably affect studies, especially those requiring long-term follow-up. Frequent changes in governance also impact morale in terms of basic job security of local research staff. Incessant turnover in management and its priorities can impair study continuity and undermine staff job satisfaction. Many of these societies too are ravaged by internal conflict from civil uprisings to full-blown warfare. From Kashmir and Afghanistan in Asia, to the Democratic Republic of Congo in Africa, to Colombia in South America and Chechnya in Europe, people live in

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

102

8. CLINICAL RESEARCH IN INTERNATIONAL SETTINGS

varying degrees of danger. Such settings create numerous obstacles to producing quality research, as the safety of both study participants and research staff may be compromised. There are concomitant changes in the needs and priorities in such communities, as the health needs will be largely first aid for physical injuries and the prevention of highly communicable diseases such as cholera and typhoid fever.63 In cases of natural disasters, such as the Indian Ocean tsunami in 2004 and Haiti’s earthquake in 2010, most of the surviving populations are displaced either temporarily or permanently.69 With both conflicts and natural disasters come the potential of loss to follow-up (due to death, displacement, etc.) and the likely disruption of transportation and other infrastructure components (e.g., hospitals) that are essential for the conduct of research. Also, unique ethical issues emerge when conducting research in countries marked by conflict and displacement, such as Syria and Turkey. This includes the potential harm imposed on participants by asking questions that may reactivate traumainduced distress. In light of the ever-shifting nature of refugee areas and associated safety issues, it is critical to design studies that are relatively flexible in their approach.17

Economic and Seasonal Migration In many of the rapidly developing economies, such as China, there is an unprecedented mass migration from rural to urban areas for economic reasons.46 This migration negatively impacts studies that require long-term follow-up and also creates confounding factors that complicate data analysis. Also, in agricultural-based economies, people move from one region to another whenever poor climate conditions affect land cultivation.62 In such cases, adults migrate temporarily in search of menial jobs. Similar to this group are pastoral nomads in SSA, Mongolia, and other developing countries. There are limited demographic and medical data on such populations, and cultural barriers further complicate the research process, as the willingness of such populations to talk to health-care workers is limited.62

Physical Barriers The physical distance between international project sites also can pose a challenge, as it often precludes the opportunity for consistent, regular dialog among scientists. Modern technology (e.g., phone, email, video conference) is only partially successful in creating a platform for conceptual exploration of the study, and such modes of communication may not always be as effective as in-person meetings. Travel expenses, including time

lost, also increase overall study costs. The differences in time zones among study sites may also hamper regular communication, leading to fragmented discussions via email.

Study Participant Characteristics A majority of potential research participants in resource-poor countries share unique characteristics that impact their health-seeking behavior and participation in research. Such factors include gender, level of education, income, and religious and traditional beliefs.50 Even the overall perception of disease itself determines the health-seeking behavior of individuals.14 Such beliefs on the causes, treatments, and prognosis of illness vary greatly across international borders and even within communities and families. Education, or lack thereof, significantly affects one’s occupation, level of income, quality of housing and access to medical care. It also impacts one’s health status and desire to seek treatment, and most importantly, determines one’s capacity to be a part of the decisionmaking process involved in health-care plans.3 Many Western medical terms have no equivalent translation, making understanding of the pathology and prognosis of the disease difficult to interpret or explain.54 In countries with prevalent interracial tensions, trust in both the health-care system and the race of the health-care provider can determine the choice of treatment (orthodox vs. alternative) and health-care facility.59

Ethical Issues Ethics in the conduct of clinical research in LMICs often generates controversy due to the peculiar socioeconomic factors found in such regions. The conceptual framework for clinical study design fits most perfectly into the sociodemographic characteristics of the predominant population in HICs. However, several factors (e.g., income, cultural beliefs, level of education) may preclude the direct transfer of clinical research frameworks from minorities in HICs to populations in LMICs. Several guidelines have been developed by a number of national and international organizations to guide the conduct of clinical research involving human subjects (Council for International Organizations of Medical Sciences16,48,49,59,67,74,76); Although covered in other chapters of this book, we will briefly highlight some salient aspects that are relevant to LMICs. Established guidelines, such as the Declaration of Helsinki developed by the World Medical Association, may be interpreted differently across international settings. This then leads to varying interpretations of concepts such as “informed consent,” “nonbeneficial”

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

103

RECOMMENDATIONS

studies, “vulnerable” populations, “equipoise” and when the use of controls are appropriate45 as well as of ethical issues such as injustice, coercion, and exploitation.29 For example, the capacity of participants to give “informed consent” is questionable because most modern terms have no exact translations in local languages. Furthermore, in the context of clinical research in lowresource settings, the absence of strong ethical and regulatory oversight leads to meager efforts to ensure participant safety or data integrity.31 For example, in India, it had been observed that ethical guidelines were violated, data were falsified, and impoverished illiterate citizens were exploited. There were increasing reports of deaths of clinical trial participants, and thereby resulting in the need for more stringent mechanisms in clinical trial regulation.11 Furthermore, since many in LMICs would not otherwise have access to treatment, provision of study drugs may raise ethical issues of vulnerability, inducement, and coercion, especially when coupled with payments to encourage participation.5 The use of treatments that are less effective than the “best proven intervention” is often times debated in the context of research in LMICs.45 Making drugs “reasonably available” to the community after the conclusion of a study also can add to a study’s projects costs, and some have argued that this may discourage the conduct of clinical trials.5,6,60,76 In cases of potentially fatal diseases such as HIV/AIDS, the use of placebo control has also been actively challenged on both ethical and moral grounds.72 Many believe that once an efficacious treatment has been identified, there is no justification for the use of placebo controls, and therefore, subsequent interventional studies should be done in comparison with the established standard regardless of local accessibility to the trial drug. Further complicating the research scene are the ethical requirements that may vary among institutional review boards (IRBs) within and between different countries,44 leading to debates if requirements should more closely resemble those found in HICs. Moreover, in some situations, nationalism and self-determination will influence acceptance of foreign research proposals by local IRBs, transfer of samples, and determination of intellectual property rights. Also, while many international guidelines stipulate the universality of standard of care, there are legitimate questions about its practicality, especially in resourcepoor settings.60,70,76 For example, in the case of prevention of mother-to-child HIV transmission, there are arguments if the accepted standard of care should resemble those offered in HICs or whether the local reality should be put in perspective.5,6,60,70 Often at the center of the standard of care controversy is the Declaration of Helsinki because its language implies use of a

universal standard. The Declaration, which has been amended nine times with the most recent revision in 2013, now mandates the testing of new interventions against the “best proven intervention” with only two exceptions: (1) when there is no existing “proven intervention” or (2) when “sound methodological reasons” justify deviating from the “best proven intervention”.42 A more recent event, specifically the 2014 West Africa Ebola outbreak, also has shed light on ethical issues surrounding the conduct of clinical research during an epidemic. While some argue that “unproven” interventions are acceptable and perhaps even compulsory under such extreme circumstances,71 others raise questions about the capacity of individuals inflicted with a fatal disease to make informed decisions about the use of unapproved drugs.23,25 Additionally, the outbreak had implications on clinical trial policies, as trials often take years to be approved by regulators and to be conducted according to the gold standards (i.e., randomized-controlled trial). Outbreaks are therefore often over before trials can begin. In the case of Ebola, a collaboration supported by WHO allowed researchers to bypass the “red tape,” leading to trial designs that helped provide useful data on how to control the outbreak.9 As seen, the conduct of clinical research studies in resource-poor countries is bound to raise several ethical and moral issues, especially where the current guidelines are not fully explicit in respect to prevailing socioeconomic factors in these regions or to extreme situations, such as an outbreak or epidemic.

RECOMMENDATIONS Understand the Local Setting The challenges faced by investigators when conducting clinical research in international settings often vary between and within regions. In most cases, lessons learned from one region cannot always be applied to another region without significant modification. Thus, investigators need to educate themselves on regionspecific information, such as cultural beliefs and values, geography and weather, and economic, political, and social climates and infrastructures. Such an understanding can inform different aspects of the research (e.g., study design, study population, setting, data collection methods, time frame) and ultimately help ensure a successful study. For example, population responses may vary by cultural context. In places like Costa Rica where health workers are revered, people are willing to participate in studies when approached. In contrast, rural residents in Kenya, Nigeria, and other parts of SSA will not

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

104

8. CLINICAL RESEARCH IN INTERNATIONAL SETTINGS

respond favorably to visitors (i.e., researchers) and are biased against public servants. Because a change in attitude may only happen when clinical studies are congruent with communities’ perceived needs,15 it is important for researchers to engage with communities in a way that allows for needs to be identified, concerns and ideas to be explored, and societal benefits to be highlighted.12,37 Additionally, ethical issues should be addressed comprehensively while being sensitive to local needs and cultures.38,43 There are other issues that should also be carefully considered before embarking on studies in developing countries. Of primary importance are the measures taken to protect the privacy of research participants, especially in studies involving stigmatized diseases (e.g., HIV/AIDS). Behaviors associated with these diseases, such as homosexuality, can carry risk of physical attack and in some cases, death. Also, repeated home visits by research staff may bring to the attention of the whole community the health problems of study subjects, again leading to stigmatization.28 Moreover, it is advisable to ensure that the consent process is easily understood, that consent documents are written at a sixth-grade reading level for increased understanding,20 and consent documents should be as informational as possible, going beyond directly translating them into the local language. Tests should be conducted to assess the level of understanding of the consent at the commencement of the study and also regularly throughout the study duration.20 Where surveys are involved, quality control methods should include using tape recorders, repeating questions, repeating visits, monitoring by supervisory staff, and rapidly reviewing completed questionnaires to identify errors and inconsistencies in subject responses. To avoid future conflicts, ownership of data and authorships also should be discussed as early as possible, especially with local collaborators. The adoption of a community-based or communityengaged participatory research model is a more sustainable solution to challenges posed by traditional research methods where investigators retain all the authority and seek interaction with communities only when recruiting subjects and collecting data. This entails negotiations at all stages of the research, where both parties highlight concerns, discuss issues, and collectively reach consensus. This has the potential to benefit both the researchers and community because it facilitates easier enrollment of participants and data collection as the community see themselves as equal partners in the process.58 In fact, documenting the terms of the research process in the form of a memorandum of understanding or similar document may be helpful. Further, this model may involve open discussion of research results with local stakeholders, which can lead to improvements in

the public health and well-being of the community. It is important to note, however, that the dissemination of results entails risks and challenges and must therefore be anticipated and properly managed as well.

Train, Mentor, and Closely Supervise Initial training geared toward researchers in developing countries needs to cover the basic principles of clinical research (e.g., study design and methods, analysis, ethical oversight) and may be better implemented at centers in LMICs rather than those in HICs. Efforts also should be made to develop independent local investigators and scientists through mentorship on grant writing and protocol development.33,47 At the initial stage, close supervision of the local study staff is required to ensure the ethical and judicious conduct of the study. Since in most instances many would lack the adequate clinical research skills and experience, technologies such as video conferencing and VOIP (voice over internet protocol) can be utilized to provide frequent feedback on the progress of the study and also troubleshoot emerging issues. Mentors should not expect work habits similar to their own, as planning, workloads, and sensitivity to deadlines and reporting vary with different cultures. Also, several approaches can be employed to address the “brain drain” of health care and research workers.64 These include providing incentives, strong mentorship, and grant, protocol and article writing support. At the federal level, there are a number of opportunities to engage in research capacity building efforts. The Medical Education Partnership Initiative (MEPI), for example, is a $130-million, 5-year award from the US NIH to 13 medical schools in Africa. Its aims are to bolster in-country medical education infrastructures and to increase retention of medical school faculty and clinical professors.52 In addition to MEPI and other similar initiatives, the NIH also offers research training and career development opportunities at the individual level, granting investigators a chance to enhance their knowledge and skill set according to their research background and interests.

Develop and Enhance Local Institutional Review Board Capacity The rapid increase in the number of international clinical trials based in developing countries, where research regulation is relatively weak, calls for the capacity building of local IRBs. Well-constituted and operational IRBs can accelerate research productivity at academic centers while ensuring human research subjects protection, as evidenced by an established IRB at the University of Ibadan in Nigeria that saw a 150% increase in number

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

RECOMMENDATIONS

of reviewed proposals and a 62% decrease in time to approval.19 However, despite the recognized importance of well-constituted ethics boards in regulating research, funding support to meet the mandate can often lag behind in many developing countries and must therefore also be addressed as a part of enhancing local IRB capacity.

Develop Office for Sponsored Research/Office of Clinical Research As the establishment of functional IRBs grows, there is a need for creating an office for sponsored research that can help investigators propose and manage grantfunded research while also serving as the key contact for external research sponsors. This office would be responsible for offering preaward and postaward services, including project proposal reviews, award negotiations, progress reports, amendments, and closeouts. Offices of clinical research ideally would ensure the proper conduct and management of clinical research that complies with relevant policies and regulations at the local, national, and international levels. They would be responsible for training all faculty, staff, and students involved in research on current guidelines and changes in regulatory issues. Activities should include establishing institutional research policies; providing resources and tools on the proper conduct of clinical research and clinical trials; and ensuring that all investigators, especially principal investigators, undergo required research training. Additional areas where the research enterprise may be enhanced include the handling of yearly declarations of conflict of interest, the development of management plans for situations where conflicts of interest exists, and the regulatory support for intellectual property rights.

Prepare Data Safety and Monitoring Plan for Adverse Events There is a need to put in place contingency plans to address adverse events before they occur. A standard operating procedure to deal with such eventualities should be devised, and training should be offered on initiation of the research study. Another essential aspect of preparing for clinical research that is often lacking is the development of a data safety and monitoring plan (DSMP) (see Chapter 10). Ideally, this should be developed prior to initiating any clinical trial so as to define activities that need monitoring, such as obtaining informed consent, ensuring high-quality data collection, and processing and reviewing plans for adverse events handling, protocol deviation, and violations. In clinical trials where death or morbidity are potential end points, the DSMP should involve creating a safety

105

monitoring committee to make timely decisions about the need to stop a study prematurely or to modify informed consent forms if necessary to deal with emerging risks.

Provide Ancillary Care A commentary to the Council for International Organizations of Medical Sciences’ Guideline 21 advises for the provision of health care to study participants beyond what is necessary for clinical research.16 Also, it is important to note that the Declaration of Helsinki now mandates both compensation and treatment for any research-related injury.45 Such will improve the commitment of research participants and the community to the project while providing benefits to the overall health status.

Use Technology for Effective Communication Effective communication among research sites is crucial and can considerably impact the success of a study. Field updates on the project status should be frequent so that potential problems are identified and quickly resolved. The adoption of the Internet as a medium for real-time communication is important and provides significant support for projects. Other technologies, such as solar energy, also have the potential to improve the execution and efficiency of clinical research projects through the provision of stable power supplies, and thereby lowering long-term costs.4,22,26 It is essential to invest in developing a computer and data management system with backup and maintenance plans due to frequent power fluctuations and surges, especially for studies conducted in settings where energy poverty is a problem.36 As such, it would be beneficial to leverage technological advancements to approach challenges posed by conducting research in low-resource settings.

Have Long-Term Plans Most studies in developing countries are often short in duration, goals, and scope. Lack of continuity usually leads to wasted resources and may negatively impact the community’s willingness to take part in future studies. Even for studies that are initially designed to be short in duration (e.g., interventional study), efforts should be made to maintain some form of contact with the local staff and community to help ensure continuity.

Integrate With Existing Infrastructure Despite the fact that in many LMICs’ health infrastructures are in deplorable conditions with inefficient

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

106

8. CLINICAL RESEARCH IN INTERNATIONAL SETTINGS

health systems, it is advised that efforts be made to integrate new studies into the system that is locally available. Such integration will prevent research workers from completely abandoning their primary duty of offering care to the general population. Also with this approach, investigators will be able to identify and build upon the strengths of local infrastructures, and thereby help strengthen the local health system and ensure sustainable clinical research practices.

CONCLUSION Improvements in global access and quality of health care are dependent on the discoveries of socioepidemiological risk factors for diseases, innovative evidencebased therapies, and their equitable distribution. Unfortunately, LMICs are critically deficient in their capacity to independently execute such efforts. They are confronted by innumerable challenges, including inadequate human resources, subpar research infrastructures and health systems, and economic and psychosocial factors of potential participants. These challenges are not insurmountable and should be no excuse for investigators and pharmaceutical companies to withhold conducting studies in the very populations that are most likely to benefit from patient-oriented studies. It is essential to address such challenges to help ensure the successful conduct of clinical research in resource-limited settings. Among a myriad of recommendations, a primary consideration is the setting in which the study will be conducted. Understanding the local setting (e.g., cultural beliefs and values; economic, social, and political climate and infrastructures) will help inform different aspects of the research, including the study design and effective participant recruitment and retention methods. It is also necessary to develop partnerships with communities as a means to engage leaders and residents in shared efforts to implement the study and disseminate findings, and thereby helping to create mutual trust, respect, and a sense of coownership of research projects. Furthermore, human resource development should entail comprehensive training programs with continual guidance and mentorship; and technological advancements (e.g., Web-based databases and communication platforms) should be utilized to enhance communication among research teams. Efforts also should be made to prevent fragmented or duplicated efforts by adapting to and leveraging the strengths of existing infrastructures. Also, the sharing of best practices between research teams working in similar settings may further improve efficiency and reveal initially unanticipated adverse events that need to be tackled preemptively.

In an increasingly interdependent world, global health issues are warranting transnational approaches that entail the conduct of clinical research in every region of the world, especially in resource-limited countries. These endeavors are rife with challenges, and it is essential to anticipate and address them appropriately to ensure that studies are adhering to research regulations and ethical guidelines. By engaging in such collective efforts, the international research enterprise can work toward implementing evidence-based interventions that alleviate global health disparities and enhance the quality of life for all nations.

SUMMARY QUESTIONS 1. Which of the following are recognized challenges to conducting quality clinical research in low- and middle-income country settings? a. Lack of resources and infrastructure for research b. Poor data collection methods c. Questionable ethical standards and poor subject protection d. All of the above 2. Use of placebos in clinical trials is justifiable under which of the following conditions? a. When there are approved and effective treatments for the condition b. If there is no disagreement about whether standard treatment is better than placebo c. When the additional risk posed by the use of placebo is minor and withholding the current standard therapy would not lead to serious or permanent harm d. If the study is being conducted in an international setting where standard therapy is unavailable 3. Which of the following is true regarding informed consent for studies in international settings? a. Societal benefit trumps individual risks in clinical research b. Risk to benefit ratio should always be in favor of the community c. The epidemic potential of a disease may shift risk benefit ratio in favor of the community d. Cultural values and norms can influence the informed consent process 4. Which of the following is/are important for the successful conduct of clinical trials in international settings? a. Knowledge of cultural beliefs and seasonal patterns b. Planning for alternative power supply c. Developing effective communication and data storage systems

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REFERENCES

d. Community engagement and partnership e. All of the above 5. Which of the following is/are the most important benefit(s) of international clinical trials? a. Contribution to improvement in global diplomacy b. Reduction of global health disparities c. Opportunity to study new drugs in a population with less likelihood of drugedrug interactions d. None of the above e. b and c

References 1. Adedokun BO, Olopade CO, Olopade OI. Building local capacity for genomics research in Africa: recommendations from analysis of publications in Sub-Saharan Africa from 2004 to 2013. Glob Health Action 2016;9:31026. http://dx.doi.org/10.3402/gha.v9.31026. 2. Adewole I, Martin DN, Williams MJ, Adebamowo C, Bhatia K, Berling C, et al. Building capacity for sustainable research programmes for cancer in Africa. Nat Rev Clin Oncol 2014;11(5): 251e9. http://dx.doi.org/10.1038/nrclinonc.2014.37. 3. Agency for Healthcare Research and Quality (AHRQ), & The Office of Behavioral and Social Sciences Research, NIH. Population health: behavioral and social science insights. 2015. Retrieved from: http:// www.ahrq.gov/professionals/education/curriculum-tools/ population-health/index.html. 4. Aviles W, Ortega O, Kuan G, Coloma J, Harris E. Integration of information technologies in clinical studies in Nicaragua. PLoS Med 2007; 4(10):1578e83. http://dx.doi.org/10.1371/journal.pmed.0040291. 5. Barry M. Ethical considerations of human investigation in developing countries: the AIDS dilemma. N Engl J Med 1988;319(16): 1083e6. http://dx.doi.org/10.1056/NEJM198810203191609. 6. Benatar SR, Singer PA. A new look at international research ethics. BMJ 2000;321(7264):824e6. 7. Biesma RG, Brugha R, Harmer A, Walsh A, Spicer N, Walt G. The effects of global health initiatives on country health systems: a review of the evidence from HIV/AIDS control. Health Policy Plan 2009;24(4):239e52. http://dx.doi.org/10.1093/heapol/czp025. 8. Burton A. Sharing science: enabling global access to the scientific literature. Environ Health Perspect 2011;119(12):A520e3. http:// dx.doi.org/10.1289/ehp.119-a520. 9. Butler D, Callaway E, Check Hayden E. How Ebola-vaccine success could reshape clinical-trial policy. Nature 2015;524(7563):13e4. http://dx.doi.org/10.1038/524013a. 10. Ceesay SJ, Casals-Pascual C, Erskine J, Anya SE, Duah NO, Fulford AJ, Conway DJ. Changes in malaria indices between 1999 and 2007 in the Gambia: a retrospective analysis. Lancet 2008;372(9649):1545e54. http://dx.doi.org/10.1016/S01406736(08)61654-2. 11. Chawan VS, Gawand KV, Phatak AM. Impact of new regulations on clinical trials in India. Int J Clin Trials 2015;2(3):3. 12. Chitambo BR, Smith JE, Ehlers VJ. Strategies for community participation in developing countries. Curationis 2002;25(3):76e83. 13. Chu KM, Jayaraman S, Kyamanywa P, Ntakiyiruta G. Building research capacity in Africa: equity and global health collaborations. PLoS Med 2014;11(3):e1001612. http://dx.doi.org/ 10.1371/journal.pmed.1001612. 14. Conrad P, Barker KK. The social construction of illness: key insights and policy implications. J Health Soc Behav 2010;51(Suppl.): S67e79. http://dx.doi.org/10.1177/0022146510383495. 15. Cornwall A, Jewkes R. What is participatory research? Soc Sci Med 1995;41(12):1667e76.

107

16. Council for International Organizations of Medical Sciences [CIOMS]. International ethical guidelines for biomedical research involving human subjects. 2002. Retrieved from Geneva, Switzerland: http://www. cioms.ch/publications/guidelines/guidelines_nov_2002_blurb.htm. 17. El-Khani A, Ulph F, Redmond AD, Calam R. Ethical issues in research into conflict and displacement. Lancet 2013;382(9894): 764e5. http://dx.doi.org/10.1016/S0140-6736(13)61824-3. 18. England R. The dangers of disease specific programmes for developing countries. BMJ 2007;335. 19. Falusi AG, Olopade OI, Olopade CO. Establishment of a standing ethics/institutional review board in a nigerian university: a blueprint for developing countries. J Empir Res Hum Res Ethics 2007; 2(1):21e30. http://dx.doi.org/10.1525/jer.2007.2.1.21. 20. Flory J, Emanuel E. Interventions to improve research participants’ understanding in informed consent for research: a systematic review. JAMA 2004;292(13):1593e601. http://dx.doi.org/10.1001/ jama.292.13.1593. 21. Gonzalez-Pier E, Gutierrez-Delgado C, Stevens G, BarrazaLlorens M, Porras-Condey R, Carvalho N, Salomon JA. Priority setting for health interventions in Mexico’s system of social protection in health. Lancet 2006;368(9547):1608e18. http://dx.doi.org/ 10.1016/S0140-6736(06)69567-6. 22. Gotch F, Gilmour J. Science, medicine and research in the developing world: a perspective. Nat Immunol 2007;8(12):1273e6. http://dx.doi.org/10.1038/ni1531. 23. Hantel A, Olopade CO. Drug and vaccine access in the Ebola epidemic: advising caution in compassionate use. Ann Intern Med 2015;162(2):141e2. http://dx.doi.org/10.7326/M14-2002. 24. Harris E. Building scientific capacity in developing countries. EMBO Rep 2004;5(1):7e11. http://dx.doi.org/10.1038/sj.embor.7400058. 25. Hayden EC, Reardon S. Should experimental drugs be used in the Ebola outbreak? Nature 2014. http://dx.doi.org/ 10.1038/nature.2014.15698. http://www.nature.com/news/ should-experimental-drugs-be-used-in-the-ebola-outbreak-1. 15698. 26. Heuck CC, Deom A. Health care in the developing world: need for appropriate laboratory technology. Clin Chem 1991;37(4):490e6. 27. Hofman K, Kramer B. Human resources for research: building bridges through the diaspora. Glob Health Action 2015;8:29559. http://dx.doi.org/10.3402/gha.v8.29559. 28. Hotez P, Ottesen E, Fenwick A, Molyneux D. The neglected tropical diseases: the ancient afflictions of stigma and poverty and the prospects for their control and elimination. Adv Exp Med Biol 2006;582: 23e33. http://dx.doi.org/10.1007/0-387-33026-7_3. 29. Johnatty RN. Clinical trials in developing countries: discussions at the ‘9th international symposium on long term clinical trials’, London, UK, 19e20 June 2000. Curr Control Trials Cardiovasc Med 2000; 1(1):55e8. http://dx.doi.org/10.1186/cvm-1-1-055. 30. Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, Daszak P. Global trends in emerging infectious diseases. Nature 2008;451(7181):990e3. http://www.nature.com/nature/journal/ v451/n7181/suppinfo/nature06536_S1.html. 31. Kochhar S. Challenges and impact of conducting vaccine trials in Asia and Africa: new technologies in emerging markets, October 16the18th 2012; world vaccine congress, Lyon. Hum Vaccin Immunother 2013;9(4):924e7. http://dx.doi.org/10.4161/hv.23405. 32. Lang T, Siribaddana S. Clinical trials have gone global: is this a good thing? PLoS Med 2012;9(6):e1001228. http://dx.doi.org/ 10.1371/journal.pmed.1001228. 33. Lansang MA, Dennis R. Building capacity in health research in the developing world. Bull World Health Organ 2004;82(10):764e70. 34. Lawn JE, Lee AC, Kinney M, Sibley L, Carlo WA, Paul VK, et al. Two million intrapartum-related stillbirths and neonatal deaths: where, why, and what can be done? Int J Gynaecol Obstet 2009;107(Suppl. 1). http://dx.doi.org/10.1016/j.ijgo.2009.07.016. S5eS18, S19.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

108

8. CLINICAL RESEARCH IN INTERNATIONAL SETTINGS

35. Laxminarayan R, Mills AJ, Breman JG, Measham AR, Alleyne G, Claeson M, Jamison DT. Advancement of global health: key messages from the disease control priorities project. Lancet 2006; 367(9517):1193e208. http://dx.doi.org/10.1016/S0140-6736(06) 68440-7. 36. Lee PD. The role of appropriate medical technology procurement and user maintenance instructions in developing countries. J Clin Eng 1995;20(5):407e13. 37. Leung MW, Yen IH, Minkler M. Community based participatory research: a promising approach for increasing epidemiology’s relevance in the 21st century. Int J Epidemiol 2004;33(3):499e506. http://dx.doi.org/10.1093/ije/dyh010. 38. London L. Ethical oversight of public health research: can rules and IRBs make a difference in developing countries? Am J Public Health 2002;92(7):1079e84. 39. Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ. Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 2006;367(9524): 1747e57. http://dx.doi.org/10.1016/S0140-6736(06)68770-9. 40. Lu C, Schneider MT, Gubbins P, Leach-Kemon K, Jamison D, Murray CJ. Public financing of health in developing countries: a cross-national systematic analysis. Lancet 2010;375(9723): 1375e87. http://dx.doi.org/10.1016/S0140-6736(10)60233-4. 41. Luchetti M. Global health and the 10/90 gap. Br J Med Pract 2014; 7(4). 42. Marouf FE, Esplin BS. Setting a minimum standard of care in clinical trials: human rights and bioethics as complementary frameworks. Health Hum Rights 2015;17(1):E31e42. 43. McIntosh S, Sierra E, Dozier A, Diaz S, Quinones Z, Primack A, Chadwick G, Ossip-Klein DJ. Ethical review issues in collaborative research between us and low-middle income country partners: a case example. Bioethics 2008;22(8):414e22. http://dx.doi.org/ 10.1111/j.1467-8519.2008.00662.x. 44. McWilliams R, Hoover-Fong J, Hamosh A, Beck S, Beaty T, Cutting G. Problematic variation in local institutional review of a multicenter genetic epidemiology study. JAMA 2003;290(3): 360e6. http://dx.doi.org/10.1001/jama.290.3.360. 45. Millum J, Wendler D, Emanuel EJ. The 50th anniversary of the declaration of Helsinki: progress but many remaining challenges. JAMA 2013;310(20):2143e4. http://dx.doi.org/ 10.1001/jama.2013.281632. 46. Murphy R. How migrant labor is changing rural China. Cambridge, UK: Cambridge University Press; 2002. 47. Narasimhan V, Brown H, Pablos-Mendez A, Adams O, Dussault G, Elzinga G, et al. Responding to the global human resources crisis. Lancet 2004;363(9419):1469e72. http://dx.doi.org/10.1016/S01406736(04)16108-4. 48. National Commission for the Protection of Human Subjects of Biomedical, Behavioral Research. The Belmont report: ethical principles and guidelines for the protection of human subjects of research. 1979. Retrieved from: http://www.hhs.gov/ohrp/humansubjects/ guidance/belmont.html. 49. National Institutes of Health. Guidelines for the conduct of research involving humann subjects at the National Institutes of Health. 2001. Retrieved from: http://ohsr.od.nih.gov/guidelines/ GrayBooklet82404.pdf. 50. O’Donnell O. Access to health care in developing countries: breaking down demand side barriers. Cad Saude Publica 2007; 23(12):2820e34. 51. Oman CB, Gamaniel KS, Addy ME. Analytical chemistry and developing nations. properly functioning scientific equipment in developing countries. Anal Chem 2006;78(15):5273e6. 52. Omaswa FG. The contribution of the medical education partnership initiative to Africa’s renewal. Acad Med 2014;89(Suppl. 8): S16e8. http://dx.doi.org/10.1097/ACM.0000000000000341.

53. Pfeiffer J. International NGOs and primary health care in Mozambique: the need for a new model of collaboration. Soc Sci Med 2003;56(4):725e38. 54. Putsch III RW, Joyce M. Dealing with patients from other cultures. In: Walker HK, Hall WD, Hurst JW, editors. Clinical methods: the history, physical, and laboratory examinations. 3rd ed. 1990. Boston. 55. Ramanakumar AV. Need for epidemiological evidence from the developing world to know the cancer-related risk factors. J Cancer Res Ther 2007;3(1):29e33. 56. Rifkin SB. Lessons from community participation in health programmes: a review of the post Alma-Ata experience. Int Health 2009;1(1):31e6. http://dx.doi.org/10.1016/j.inhe.2009.02.001. 57. Rohde J, Cousens S, Chopra M, Tangcharoensathien V, Black R, Bhutta ZA, Lawn JE. 30 years after Alma-Ata: has primary health care worked in countries? Lancet 2008;372(9642):950e61. http:// dx.doi.org/10.1016/S0140-6736(08)61405-1. 58. Ross LF, Loup A, Nelson RM, Botkin JR, Kost R, Smith Jr GR, Gehlert S. Human subjects protections in community-engaged research: a research ethics framework. J Empir Res Hum Res Ethics 2010;5(1):5e17. http://dx.doi.org/10.1525/jer.2010.5.1.5. 59. Saha S, Komaromy M, Koepsell TD, Bindman AB. Patientphysician racial concordance and the perceived quality and use of health care. Arch Intern Med 1999;159(9):997e1004. 60. Shapiro HT, Meslin EM. Ethical issues in the design and conduct of clinical trials in developing countries. N Engl J Med 2001;345(2): 139e42. http://dx.doi.org/10.1056/NEJM200107123450212. 61. Sharma SP. Politics and corruption mar health care in Nepal. Lancet 2010;375(9731):2063e4. 62. Sheik-Mohamed A, Velema JP. Where health care has no access: the nomadic populations of sub-Saharan Africa. Trop Med Int Health 1999;4(10):695e707. 63. Sidel VW, Levy BS. The health impact of war. Int J Inj Contr Saf Promot 2008;15(4):189e95. http://dx.doi.org/10.1080/17457300802404935. 64. Stilwell B, Diallo K, Zurn P, Vujicic M, Adams O, Dal Poz M. Migration of health-care workers from developing countries: strategic approaches to its management. Bull World Health Organ 2004; 82(8):595e600. S0042-96862004000800009. 65. Tangcharoensathien V, Travis P. Accelerate implementation of the WHO global code of practice on international recruitment of health personnel: experiences from the South East Asia region: comment on “relevance and effectiveness of the WHO global code practice on the international recruitment of health personnel e ethical and systems perspectives”. Int J Health Policy Manag 2016;5(1): 43e6. http://dx.doi.org/10.15171/ijhpm.2015.161. 66. Timaeus I, Harpham T, Price M, Gilson L. Health surveys in developing countries: the objectives and design of an international programme. Soc Sci Med 1988;27(4):359e68. 67. U.S. Department of Health, Human Services. Nuremberg code: directives for human experimentation. 2013. Retrieved from: http://ori. hhs.gov/chapter-3-The-Protection-of-Human-Subjectsnuremberg-code-directives-human-experimentation. 68. United Nations Programme on HIV/AIDS [UNAIDS]. AIDS epidemic update 2009. 2009. Retrieved from: http://www.unaids. org/en/dataanalysis/epidemiology/2009aidsepidemicupdate/. 69. Waring SC, Brown BJ. The threat of communicable diseases following natural disasters: a public health response. Disaster Manag Response 2005;3(2):41e7. http://dx.doi.org/10.1016/ j.dmr.2005.02.003. 70. Wendler D, Emanuel EJ, Lie RK. The standard of care debate: can research in developing countries be both ethical and responsive to those countries’ health needs? Am J Public Health 2004;94(6): 923e8. 71. White BD, Gelinas LC, Shelton WN. In particular circumstances attempting unproven interventions is permissible and even

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REFERENCES

obligatory. Am J Bioeth 2015;15(4):53e5. http://dx.doi.org/ 10.1080/15265161.2015.1009566. 72. Wilfert CM, Ammann A, Bayer R, Curran JW, del Rio C, Faden RR, Sessions K. Science, ethics, and the future of research into maternal infant transmission of HIV-1. Perinatal HIV intervention research in developing countries workshop participants. Lancet 1999; 353(9155):832e5. 73. World Bank Group. World development indicators 2016. 2016. 74. World Health Organization. Handbook for good clinical research practice (GCP): guidance for implementation. 2002. Retrieved from: http://apps.who.int/prequal/info_general/documents/gcp/ gcp1.pdf.

109

75. World Health Organization. Global status report on noncommunicable diseases 2014. 2014. Retrieved from: http://apps.who.int/iris/ bitstream/10665/148114/1/9789241564854_eng.pdf. 76. World Medical Association [WMA]. World Medical Association Declaration of Helsinki: Ethical principles for medical research involving human subjects. Retrieved from: http://www.wma.net/en/ 30publications/10policies/b3/. 77. Yach D, Hawkes C, Gould CL, Hofman KJ. The global burden of chronic diseases: overcoming impediments to prevention and control. JAMA 2004;291(21):2616e22. http://dx.doi.org/10.1001/ jama.291.21.2616.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

9 The Role and Importance of Clinical Trial Registries and Results Databases Deborah A. Zarin, Rebecca J. Williams, Tony Tse, Nicholas C. Ide National Institutes of Health, Bethesda, MD, United States

O U T L I N E Introduction

111

Background Definitions Rationale for Clinical Trial Registration and Results Reporting History of ClinicalTrials.gov

112 112

118 118 118 119

Using ClinicalTrials.gov Data Intended Audience Search Tips for ClinicalTrials.gov Points to Consider When Using ClinicalTrials.gov to Study the Overall Clinical Research Enterprise

120 120 121

Looking Forward

121

Conclusion

123

Summary/Discussion Questions

123

References

123

112 113

Current Policies 115 Policies Affecting Clinical Trials in the United States 115 International Landscape 115 Registering Clinical Trials at ClinicalTrials.gov Data Standards and the Minimal Data Set Points to Consider Interventional Versus Observational Studies What Is a Single Clinical Trial? Importance of the Protocol Keeping Information Up-to-Date

116 116 117 117 117 117 117

Reporting Results to ClinicalTrials.gov Data Standards and the Minimal Data Set

118 118

INTRODUCTION The clinical research enterprise generates scientific data through the conduct of experiments in human volunteers. As described in other chapters, a key objective of the clinical research enterprise is obtaining generalizable knowledge to advance the medical evidence base and to inform clinical decision-making. Public access to information about individual research studies and their results is necessary to achieve this objective,

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00009-5

Points to Consider Data Preparation Review Criteria Relation of Results Reporting to Publication Key Scientific Principles and Best Practices for Reporting Issues in Reporting Outcome Measures Issues Related to Analysis Population

119 119 120

121

ethically, legally, and scientifically. Several high-profile cases, which are symptomatic of a deeper, more pervasive problem in the traditional model of clinical research information dissemination, indicate that lack of systematic access to information about ongoing and completed clinical studies can lead to a skewed view of available evidence regarding the safety or effectiveness of a medical intervention for a particular use (Table 9.1).1 This chapter focuses on trial registries and results databases that are designed to make summary clinical

111

Copyright © 2018. Published by Elsevier Inc.

112 TABLE 9.1

9. TRIAL REGISTRIES AND RESULTS DATABASES

Selected Examples of Distortion of the Evidence Base Caused by Incomplete Disclosure of Clinical Trial Information56e61

Issue

Description

Examples

Selective publication of studies

Publication limited to studies with favorable results

Antidepressants,56 Paxil (paroxetine) studies in children14,57

Selective reporting of outcomes

Publication limited to the most favorable prespecified outcomes; other less-favorable prespecified outcomes not acknowledged or reported

Cyclooxygenase-2 (COX-2) inhibitors58,59

Modification of prespecified outcome measures

Publication of outcome measures that differ from those prespecified in the protocol

Vytorin (ezetimibe þ simvastatin),60 Neurontin (gabapentin)61

research information publicly accessible and available. Although such databases serve multiple goals and audiences, one key goal is to mitigate the effects of bias from incomplete disclosure of clinical trials and their results by promoting full disclosure throughout the trial life cycle. The chapter also reviews recent trends and upcoming issues in promoting increased transparency to support public health and the scientific process. ClinicalTrials.gov (https://clinicaltrials.gov/), established and maintained by the National Library of Medicine (NLM) at the National Institutes of Health (NIH), is used throughout this chapter as a case study. It is the world’s largest publicly available clinical trials registry and results database.

BACKGROUND Definitions Clinical trial registration refers to the process of submitting and updating summary information about a clinical study protocol to a structured Web-based registry that is accessible to the public, such as ClinicalTrials.gov. A study record typically contains Stage of Study

summary information about the study, such as recruitment status, eligibility criteria, and contact information. Results reporting refers to the process of submitting and updating summary information about the results of a clinical study to a structured, publicly accessible, Web-based results database. The two processes parallel the life cycle of clinical trials (Fig. 9.1).

Rationale for Clinical Trial Registration and Results Reporting Different stakeholder groups have proposed a variety of reasons for registering clinical trials, and these reasons have been expanded over time to address new challenges (Table 9.2). Simes2 is widely recognized as the first to call for the comprehensive, prospective registration of clinical trials to address concerns that favorable results are published more often than unfavorable ones (i.e., publication bias). Simes observed that “a traditional review of the published literature could result in overly optimistic conclusions concerning new therapies. (p. 1538).” With access to a prospective trial registry that includes summary information about all clinical trials, it would be possible to identify all relevant

Steps in Clinical Trials Disclosure

IRB Review and Approval of Protocol Before

Study Initiation

During

After

Study Conduct & Protocol Amendments

Study Completion & Data Analysis

1. Initial Registration 2. Updates to the Registry (as necessary) • Recruitment Status • Enrollment • Start and Completion Dates • Key Protocol Changes 3. Initial Results Reporting 4. Updates to the Results Database and/or Registry (as necessary)

FIGURE 9.1 Steps in registration and results reporting parallel the research life cycle.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

BACKGROUND

113

TABLE 9.2 Ethical and Scientific Rationale for Clinical Trial Registration and Results Reporting Category

Reasons

Human subject protections

∙ Allow potential participants to find studies ∙ Assist ethical review boards and others to determine appropriateness of studies being reviewed (e.g., harms, benefits, redundancy) ∙ Promote fulfillment of ethical responsibility to human volunteersdresearch contributes to medical knowledge

Research integrity

∙ Facilitates tracking of protocol changes ∙ Enhances transparency of research enterprise

Evidence-based medicine

∙ Facilitates tracking of studies and outcome measures ∙ Allows more complete identification of relevant studies

Allocation of resources

∙ Promotes more efficient allocation of resources (e.g., investigators, institutional review boards, volunteers)

trials that were conducted (or are in progress) and to detect what proportions of initiated trials have provided published results. Others have cited registration and results reporting as important tools for helping to fulfill ethical principles underlying research in humans. For instance, medical research involving risk to humans generally is conducted on the basis of the promise that it will contribute to generalizable knowledge.3 When studies and their results are not publicly disclosed, the biomedical knowledge base cannot be advanced and promise to participants remains unfulfilled. Public disclosure through trial registration and results reporting promotes this ethical requirement.4,5 For instance, the Declaration of HelsinkidEthical Principles for Medical Research Involving Human Subjects (2013) requires prospective registration for “every research study involving human subjects” (Article 35), as well as dissemination of “negative and inconclusive as well as positive results” to the public (Article 36).6

History of ClinicalTrials.gov In 1988, the Health Omnibus Programs Extension (HOPE) Act7 required the NIH and the US Food and Drug Administration (FDA) with the Centers for Disease Control and Prevention (CDC) to provide public information about publicly and privately funded clinical trials of investigational drugs for human immunodeficiency virus (HIV)-related diseases. The AIDSTRIALS database, which contained summary protocol information for most NIH-funded and many industrysponsored acquired immunodeficiency syndrome (AIDS) trials, became available on the Web starting in 1996.8 The HOPE Act “envisioned medical providers and scientific researchers as the intended audience” (p. 129). In 1995, the National Cancer Institute (NCI) launched the CancerNet Website, which provided online access to a cancer trials registry as part of its

mandate under the National Cancer Act of 1971 to disseminate health information.9 A federally supported registry, ClinicalTrials.gov, was launched in February 2000 to implement Section 113 of the FDA Modernization Act (FDAMA) (Fig. 9.2).10 This law called on the NIH to “establish, maintain, and operate a data bank of information on clinical trials for drugs for serious or life-threatening diseases and conditions.” Specifically, it required this registry of clinical trials to include, among other things, “a description of the purpose of each experimental drug, .eligibility criteria for participation in the clinical trials, a description of the location of trial sites, and a point of contact for those wanting to enroll in the trial.” Because FDAMA was intended to provide information “[to] individuals with serious or life-threatening diseases and conditions, to other members of the public, to health care providers, and to researchers,” it required that information in the registry “shall be in a form that can be readily understood by members of the public.” Over time, other trial registration policies were implemented to enhance access to information about clinical trials. In 2004, the International Committee of Medical Journal Editors (ICMJE),11,12 an influential group of medical journal editors, issued a policy requiring registration of clinical trials before the enrollment of the first participant to document publicly the prespecified study design and to track any changes. The State of Maine13 enacted legislation and promulgated rules in 2005 (subsequently repealed in mid-2011) to ensure that information about clinical trials of FDA-approved drugs and biologics marketed in Maine was available to its citizens. State attorneys general also have incorporated mandatory clinical trial results reporting requirements into legal settlements with drug companies (e.g., the GlaxoSmithKline Clinical Study Registry [http://www.gskclinicalstudyregister.com/] following a lawsuit initiated by the New York State Office of the Attorney General in 2004 alleging that the company concealed results

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

114

9. TRIAL REGISTRIES AND RESULTS DATABASES

FIGURE 9.2 Cumulative number of registered studies from February 2000 to September 2014 and key events. Used with permission from Zarin DA, Tse T, Sheehan J. The proposed rule for U.S. clinical trial registration and results submission. New Engl J Med January 8, 2015;372(2):174e180. Copyright © 2016 Massachusetts Medical Society.

derived from studies of the antidepressant, Paxil14). In addition to its own statutory mandates, ClinicalTrials. gov accommodates such policies and encourages voluntary registration of trials that do not fit under any of these policiesdunder the assumption that a single comprehensive database containing standardized information about clinical trials serves the public good and supports a wide variety of user needs. In 2007, the FDA Amendments Act of 2007 (FDAAA) was enacted to expand the ClinicalTrials.gov registry and add a results database.15 Section 801, which amends the Public Health Service Act, requires a “responsible party”ddefined as the study sponsor or a designated principal investigator who controls the study datadto submit information to the ClinicalTrials.gov registry for a clinical trial of FDA-regulated drugs, biologics, and devices that meets the definition of “applicable clinical trial” (generally, phases 2 through 4 clinical trials of drugs, biologics, or devices with at least one site in the United States). In addition, section 801 requires the submission of summary results information for applicable clinical trials of drugs, biologics, and devices that have been approved, licensed, or cleared by FDA. Other provisions of the law specify what “clinical trial information” is required to be submitted, when it is to be submitted and posted publicly, and the timeline for

updating that information. FDAAA states that a goal of expanding ClinicalTrials.gov is “to enhance patient enrollment and provide a mechanism to track subsequent progress of clinical trials.” Enforcement provisions for noncompliance with this federal law include civil monetary penalties and withholding of federal grant funding. In 2014, the Department of Health and Human Services (DHHS) issued a notice of proposed rulemaking (NPRM) for public comment describing the draft requirements and procedures for the registration and results reporting of clinical trials on ClinicalTrials.gov, in accordance with FDAAA.16 Notably, the NPRM proposed requiring (1) results reporting for clinical trials of unapproved products (i.e., drugs, biological products, or devices that have not been approved, licensed, or cleared by FDA for any use) and (2) at registration, more structured reporting of all primary and secondary outcome measures specified in the study protocol. It also invited comments on requiring the submission of several types of adverse event information, such as the time frame during which adverse event data were collected, and providing an all-cause mortality table listing all participant deaths from any cause.17 In parallel, NIH proposed a separate policy indicating that “all NIH-funded awardees and investigators conducting clinical trials, funded in whole or in part by NIH,

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

115

CURRENT POLICIES

TABLE 9.3 Comparison of Summary Requirements for the Proposed Rule, Draft NIH Policy, and the ICMJE Policya Requirement

FDAAA NPRM (Proposed Rule)

Draft NIH Policy

ICMJE Policy

Type of study

Interventional studies considered “controlled”c

Interventional studies considered “clinical trials”d

Any interventional study

Intervention type

Drugs, biologics, and devices

Any type of intervention, including surgical, behavioral, or other interventions

Any type of intervention, including surgical, behavioral, or other interventions

Trial phase

Any phase, except phase 1 drug trials and small feasibility device trials

Any phase

Any phase

Funding source

Any, whether private or public

NIH-funded, in whole or in part

Any, whether private or public

Scope of reporting

Registration and summary results

Registration and summary results

Registration only

Responsibility for ensuring reporting

Responsible party (sponsor or designated principal investigator)

NIH awardee

Author of manuscript

Enforcement mechanisms

Up to $10,000 per day in civil monetary penalties; possible withholding of NIH and other federal grant funds

Possible suspension or termination of NIH funding; consideration of noncompliance in future funding decisions

Editor’s refusal to publish manuscript

b

a

FDAAA denotes Food and Drug Administration Amendments Act, ICMJE International Committee of Medical Journal Editors, NIH National Institutes of Health, and NPRM notice of proposed rule-making. b An investigational study is defined in the NPRM as “a clinical study or a clinical investigation [in which] participants are assigned prospectively to an intervention or interventions according to a protocol to evaluate the effect of the intervention(s) on biomedical or other health related outcomes.” c These include all multigroup interventional studies but may exclude some single-group studies that do not involve a nonconcurrent control. d These include all interventional studies, except those that do not meet the revised NIH definition of “clinical trial” (http://grants.nih.gov/grants/guide/notice-files/NOT-OD-15015.html). Used with permission from Zarin DA, Tse T, Sheehan J. The proposed rule for U.S. clinical trial registration and results submission. New Engl J Med January 8, 2015;372(2): 174e180. Copyright © 2016 Massachusetts Medical Society.

regardless of study phase, type of intervention, or whether they are subject to the FDAAA . are expected to ensure that their NIH-funded clinical trials are registered and summary results, including adverse event information, are submitted to ClinicalTrials.gov” (Table 9.3).18 The over 1000 public comments received for both the NPRM19 and the draft NIH policy were carefully reviewed and considered in drafting a final rule and the NIH policy, which are scheduled to be issued in Fall 2016. (NOTE: At the time of this writing in August 2016, HHS had submitted the draft final rule to the White House Office of Management and Budget (OMB) for regulatory review.) The effects of these two policies on transparency of the clinical research enterprise are anticipated to be far-reaching: An estimated 70% of registered trials sponsored by large US academic medical centers and other nonprofit organization will be subject to at least one of these requirements.20

CURRENT POLICIES

registration policy and US federal law (FDAAA). Additionally, several US organizations have implemented policies requiring registration and results reporting to ClinicalTrials.gov including the following: • Centers for Medicare and Medicaid Services (CMS): A general requirement of the CMS Coverage with Evidence Development program includes study registration and the public availability of study results within 12 months of the study’s primary completion date21; • Department of Veterans Affairs (VA): Requires registration and results reporting of VA Office of Research & Development-funded trials22; and • Patient-Centered Outcomes Research Institute (PCORI): Requires registration and results reporting for PCORI-funded comparative effectiveness research clinical trials and observational studies.23 States’ attorneys general and the HHS Office of the Inspector General have incorporated ClinicalTrials.gov reporting requirements into various legal agreements with pharmaceutical manufacturers.24,25

Policies Affecting Clinical Trials in the United States

International Landscape

Two major policies that currently affect the reporting of clinical trials in the United States are the ICMJE

Since 2004, the World Health Organization (WHO) has promoted trial registration internationally,26 including

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

116

9. TRIAL REGISTRIES AND RESULTS DATABASES

developing a standard set of minimal information required for registration.27 In 2015, WHO issued a Statement on Public Disclosure of Clinical Trials Results which, among other things, recommended that “the key outcomes [of clinical trial results] are to be made publicly available within 12 months of study completion by posting to the results section of the primary clinical trial registry.”28,29 WHO also established and operates the International Clinical Trials Registry Platform Search Portal.30 The ICMJE accepts prospective registration of clinical trials in the 15 primary registries (as of 8/5/16) of the WHO portal and ClinicalTrials.gov. However, substantial growth of trial registries around the world is associated with unresolved issues. For example, registration of a clinical study more than once (or duplicate registration) has increasingly become a challenge across the world’s registriesdit is often difficult to tell whether two registry entries represent the same or different studies.31 In 2004, in the European Union (EU), the European Medicines Agency (EMA) launched a legislatively mandated database for clinical trials of drugs and biologics subject to its jurisdictiondthe European Union Drug Regulating Authorities Clinical Trials database (EudraCT).32 Initially, EudraCT was accessible only to regulatory and legal entities in the EU. Provisions in subsequent regulations required that protocol- and results-related information for certain clinical trials submitted to EudraCT be made available publicly, regardless of the status of marketing approval.33,34 The EMA created the EU Clinical Trials Register (https:// www.clinicaltrialsregister.eu/ctr-search/), from which summary protocol information (starting in 2011) and summary results (starting in 2014) information is publicly available. Notably, the EudraCT summary results data requirements are “substantially aligned” with those of the ClinicalTrials.gov results database.35 In 2014, the EU adopted the Clinical Trial Regulations, which include requirements such as additional disclosure of clinical study reports submitted as part of marketing authorizations for investigational medicinal products.36

REGISTERING CLINICAL TRIALS AT CLINICALTRIALS.GOV In general, registration is the process of submitting summary protocol information for a clinical trial for public posting. ClinicalTrials.gov permits the registration of biomedical or health-related research studies in humans that meet the following two requirements37: 1. Conformance with any applicable human subject or ethics review regulations (or equivalent) (e.g., institutional review board [IRB] approval) and

2. Conformance with any applicable regulations of the national (or regional) health authority (or equivalent) Because ClinicalTrials.gov serves as a long-term public registry, posted records remain available to the public even after a trial is over. Registration data are submitted through a Web-based Protocol Registration and Results System (PRS) in one of two modes: interactive data entry or Extensible Markup Language (XML)-file upload. Once the entered or uploaded data have been reviewed and approved by the data provider, records are released to ClinicalTrials. gov for processing. The content of each record is reviewed for apparent validity, meaningful entries, logic and internal consistency, and formatting. If any major problems are detected, the record is sent back with comments. If no major problems are identified, the record is generally posted at ClinicalTrials.gov within two to five business days. The first time that a study is posted, a ClinicalTrials.gov identifier (also known as the “NCT number”) is assigned to that study. This NCT number will be used from that point on to uniquely identify the particular clinical trial in various contexts (e.g., manuscript submission to a journal,38 certification to FDA or NIH, PubMed citation index). Once posted, records are accessible to the public and maintained by the data provider through the PRS. Updates to data elements (e.g., recruitment status) are made through the PRS and updated records reposted after quality review. Updates should be made as soon as practicable to keep the information current and accurate (with some data elements requiring more rapid updates based on legal requirements). A history of all changes (and the dates on which they were made) can be viewed on the publicly available archive site.

Data Standards and the Minimal Data Set The ClinicalTrials.gov protocol registration data elements can be divided into four general categories: 1. Descriptive information (e.g., study type, phase, study design, outcomes) 2. Recruitment information (e.g., eligibility criteria, overall recruitment status) 3. Location and contact information (e.g., sponsor name, facility name, and contact) 4. Administrative data (e.g., organization’s unique protocol identifier, secondary identifiers) These data elements are intended to provide the basic information needed to describe a study. Although not all are required by ClinicalTrials.gov or other policies, all data providers are encouraged to complete all data

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REGISTERING CLINICAL TRIALS AT CLINICALTRIALS.GOV

elements. Note that the ClinicalTrials.gov data elements include the ICMJE/WHO minimum 20-item Trial Registration Data Set (Version 1.2.1).27

Points to Consider Interventional Versus Observational Studies ClinicalTrials.gov uses the following definitions to distinguish between interventional and observation studies39: • Interventional: A clinical study in which participants are assigned to receive one or more interventions (or no intervention) so that researchers can evaluate the effects of the interventions on biomedical or healthrelated outcomes. The assignments are determined by a study protocol. Participants may receive diagnostic, therapeutic, or other types of interventions. • Observational: A clinical study in which participants identified as belonging to study groups are assessed for biomedical or health outcomes. Participants may receive diagnostic, therapeutic, or other types of interventions as part of their routine care, but the investigator does not assign participants to specific interventions (as in an interventional study). The key factor in differentiating between these study types is whether the individual received a specific intervention based on assignment by an investigator according to a research protocol. A common misconception is that studies of diagnostic interventions are not interventional. For example, some studies that investigators consider to be observational or epidemiologic use experimental diagnostic tests as part of the design. Use of these tests makes the study interventional because such participants are being exposed to an intervention as a consequence of the research protocol; they would not have received that intervention had they not been in the study. This would also be true if the study involves the use of an approved diagnostic test (e.g., a positron emission tomography scan) with greater frequency or in a different manner than would have occurred had the individual not been included in the study. What Is a Single Clinical Trial? Each clinical trial should be registered once at ClinicalTrials.gov. In general, a clinical trial has a defined group of human subjects who are assigned to interventions, and the collected data are grouped together for analysis, based on a protocol. However, it sometimes is difficult to determine what represents a single clinical trial. Many trials occur at multiple sites but follow a single core protocol (although a site may modify the

117

protocol, e.g., based on local IRB review), and the data from each site are intended to be combined and analyzed in aggregate. For purposes of registration and results reporting at ClinicalTrials.gov, such trials represent a single clinical trial and should be registered only once. On occasion, multiple individuals associated with a trial at different study sites inadvertently register it separately (i.e., duplicate registration). These separate registrations often will contain slightly different content for key data elements. The outcome of this is ambiguity about whether the two (or more) registry records represent the same study. This undermines one important function of the databasedto provide a unique denominator of all trials conducted on a given condition. Therefore, systems are in place at ClinicalTrials.gov to help avoid these “unintended” duplicates. Study sponsors or trial personnel can avoid such problems by identifying one person who has the responsibility for submitting information to ClinicalTrials.gov. This designation must occur at the very beginning and ideally should be noted on the protocol and within any clinical trial agreements. Another challenge in determining what is a single clinical trial involves follow-on study designs that occur after completion of the “initial” study in which participants are tracked for an extended period of time to allow for additional adverse event and secondary outcome data to be collected, typically using an open-label, nonrandomized design. Because these data collections often are defined within the original protocol and analysis plan, and include the same participants as the original protocol, such studies generally are considered a single clinical trial. In contrast, a follow-on study that requires reconsent and/or includes subjects who were not part of the original study is generally considered to be a separate trial. Importance of the Protocol As in much scientific research, the development of a research plan or protocol that includes prespecified hypotheses and methods, including explicitly defined variables of interest, is critical. The validity of any statistical analyses or conclusions is based on adherence to those prespecified methods. Registration must provide a description of the study/protocol with sufficient specificity to allow viewers to make a meaningful determination on whether a report of the results (either in a results database or in a publication) is consistent with the original (or amended) protocol.40 Keeping Information Up-to-Date Data must be kept up-to-date to best serve the needs of researchers, potential participants, and the general public. Some data elements (e.g., recruitment status, anticipated start and completion dates) will predictably

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

118

9. TRIAL REGISTRIES AND RESULTS DATABASES

change over time; other data elements may change if the protocol has been amended (e.g., modification of a primary outcome measure). In the case of ClinicalTrials. gov, all changes are tracked in a publicly available archive site (https://clinicaltrials.gov/archive/), although the default public view always will contain the most recent version for each data element.

REPORTING RESULTS TO CLINICALTRIALS.GOV Data providers may create the results section of a ClinicalTrials.gov record after data collection for that study is complete for at least one primary outcome measure. To do so, a protocol registration (i.e., a record with a ClinicalTrials.gov identifier) must already exist in general. Similar to the protocol registration process, results information must be submitted through the Web-based PRS, either interactively or by XML-file upload, using (required and optional) data elements.41 The results database as implemented was designed to satisfy key legal requirements, including the need for certain search features. The design of the ClinicalTrials. gov results database also was informed by current standards and best practices in results reporting (e.g., for journal publication42 or regulatory submission43), discussions with numerous experts, and comments from stakeholders.

Data Standards and the Minimal Data Set The Results section of the record consists of administrative information and four scientific modules: Participant Flow, Baseline Characteristics, Outcome Measures and Statistical Analyses, and Adverse Event Information. The data tables submitted in each module must TABLE 9.4

be sufficiently informative by themselves because they are displayed at ClinicalTrials.gov without detailed, supporting narrative text (Table 9.4).

Points to Consider Data Preparation Summarizing results information is typically a complex cognitive task that requires familiarity with the study and the data, and experience with presenting summary results data in a tabular format. Just as in the preparation of results for journal publication or submission of data to regulatory authorities, it is likely that the principal investigator and/or the study biostatistician will need to be involved in the preparation of a submission to the ClinicalTrials.gov results database. The information entered must be accurate, precise, and informative to an educated reader of the medical literature who is not already familiar with the specific study. After data entry but before the time of submission, it may be helpful to have a colleague who was not part of the study team review the submission for clarity and comprehension. Review Criteria Records are reviewed before public posting for apparent validity, meaningful entries, logic, internal consistency, and formatting (Table 9.5). Following automated validation, which alerts data providers to missing or internally inconsistent information, all submissions are manually reviewed. As of 2016, about a third of all records submitted by results data providers could be posted without revision. For example, some invalid data can be detected by ClinicalTrials.gov staff, but other data cannot be verified because ClinicalTrials.gov does not have an independent source of study data (e.g., “832 years” is clearly an invalid entry for mean age,

Summary of Scientific Modules in the ClinicalTrials.gov Results Database

Module (in Tabular Format)

Brief Description

Participant flow

Progress of research participants through each stage of a trial, including the number of trial participants who dropped out (identical in purpose to a CONSORT flow diagram,42 but represented as tables)

Baseline characteristics

Demographic and baseline data for the entire trial population and for each arm or comparison group

Outcome measures and statistical analyses

Data for each outcome measure by arm (i.e., initial assignment of groups to interventions) or comparison group (i.e., groups receiving interventions regardless of initial assignment). Accommodates categorical, continuous, and time-to-event data types and a variety of statistical analyses

Adverse events

Listing of (1) all serious adverse events and (2) other adverse events exceeding a specified frequency threshold within an arm or group. Both tables include anticipated and unanticipated adverse events by arm and are grouped by organ system.

Adapted from Tse T, Williams RJ, Zarin DA. Update on registration of clinical trials in ClinicalTrials.gov. Chest July 2009;136(1):304e305.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

119

REPORTING RESULTS TO CLINICALTRIALS.GOV

TABLE 9.5 ClinicalTrials.gov Quality Review Criteria Quality Review Criterion

Description

Example

Comment

Lack of apparent validity

Data are not plausible on the basis of information provided

Outcome measure data: mean value of 263 h of sleep per day

Measure of mean hours per day can have values only in the range of 0 to 24, so value of 263 is not valid

Meaningless entry

Information is too vague to permit interpretation of data

Outcome measure: description states “clinical evaluation of adverse events, laboratory parameters, and imaging,” data reported as 100 and 96 participants in each group

Data are uninformative; unclear what counts of 100 and 96 participants refer to; description of outcome measure not sufficient for an understanding of the specific outcome

Data mismatch

Data are not consistent with descriptive information

Outcome measure is described as “time to disease progression,” data reported as 42 and 21 participants in each group

A time-to-event measure requires a unit of time (e.g., days or months)

Internal inconsistency

Information in one section of record conflicts with or appears to be inconsistent with information in another section

Study type is “observational,” but study title includes the word “randomized”

Randomized studies are interventional, not observational

Trial design unclear

Structure of tables and relevant group names and descriptions do not permit a reader to understand the overall trial design or do not accurately reflect the design

Results modules: participant flow and baseline characteristics entered as a two-group study with a total of 400 participants; outcomes entered for three comparison groups with 600 participants

If there is a third group, this should be reflected in the description of participant flow and baseline characteristics

Used with Permission From Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results databaseeupdate and key issues. New Engl J Med March 3, 2011; 364(9):852e860. Copyright © 2016 Massachusetts Medical Society.

whereas “83.2 years” may or may not be the true mean age). Thus, meeting the ClinicalTrials.gov review criteria does not guarantee that the submission is valid or fully compliant with various policy or legal requirements. Relation of Results Reporting to Publication It is important to note that the goals of reviewing and reporting summary protocol and results information at ClinicalTrials.gov are not identical to those of editorial and peer review for journal publication. Although both seek to ensure accurate and informative data reporting, ClinicalTrials.gov does not reject studies based on the perceived quality or significance of the research and allows for disclosure of all prespecified primary and secondary outcomes as well as other prespecified and post hoc outcomes. In contrast, journal peer review focuses on selecting quality research likely to be of particular interest to its readers, and authors and editors may limit the focus of an article to critical aspects of the research. Journal review also aims to ensure that the narrative is consistent with the data and provides appropriate context and conclusions; because ClinicalTrials.gov does not permit significant narrative portions, these functions are not applicable. In November 2010, an analysis of a sample of 150 entries posted in the ClinicalTrials.gov results database revealed that only 78 (52%) were associated with a PubMed citation.44 Thus, despite the different goals, the results database

complements journal publication by providing information in a structured format that might not otherwise be available.

Key Scientific Principles and Best Practices for Reporting Issues in Reporting Outcome Measures Prespecification of primary and secondary outcome measures in the protocol and registration record is critical to the integrity of the conduct of the study as well as results reporting. ClinicalTrials.gov defines primary outcome measure as “a planned measurement described in the protocol that is used to determine the effect of interventions on participants in a clinical trial.”39 The Consolidated Standards of Reporting Trials (CONSORT) statement, an international standard for reporting the results of randomized clinical trials, requires that prespecified primary and secondary outcome measures be “completely defined., including how and when they were assessed (p. 2).”42 In the ClinicalTrials.gov results database, a fully specified outcome measure involves several components: four levels of specificity and a time frame (Fig. 9.3). Further, CONSORT defines the primary outcome measure to be “the pre-specified outcome considered to be of greatest importance to relevant stakeholders .and is usually the one used in the sample size

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

120

9. TRIAL REGISTRIES AND RESULTS DATABASES

FIGURE 9.3 An example of the four levels of specification in reporting outcome measures. Used with permission from Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results databaseeupdate and key issues. New Engl J Med March 3, 2011;364(9):852e860. Copyright © 2016 Massachusetts Medical Society.

calculation.” Analysis of 2178 clinical trial records with results posted revealed that the median number of primary outcome measures reported per trial was 1, although many include multiple primary outcome measures per trial (up to 71).44 This raises questions not only about the way in which clinical trialists use this term but also about the degree of selective reporting of outcome measures that may occur in the publication process because publications do not generally report such large numbers of primary outcome measures. ClinicalTrials.gov defines secondary outcome measures as “A planned Outcome Measure in the protocol that is not as important as the Primary Outcome Measure but is still of interest in evaluating the effect of an intervention.”39 Based on the same analysis (n ¼ 2178), the median number of secondary outcome measures reported per trial was 3 with up to 122 per trial.44 Because registration requires the reporting of all primary and secondary outcome measures, it has become necessary to consider the boundary between “secondary” and “other prespecified” outcome measures, which some call tertiary outcome measures. As suggested in the proposed rule,17 for purposes of registration and results reporting at ClinicalTrials.gov, outcome measures for which a specific analytic plan is prespecified are typically considered “secondary,” and other, more

exploratory outcome measures are considered “other prespecified.” Issues Related to Analysis Population In general, the outcome measure can be thought of as the numerator and the analysis population as the denominator when trial results are reported. For example, if the outcome measure of interest is the number of participants with a myocardial infarction (MI), outcomes may be compared across arms as number of participants with MI/number of participants in arm (or number of participants analyzed). Structured trial reporting has led to greater awareness of the degree to which different analysis populations may be used across analyses within a given trial. ClinicalTrials.gov does not mandate any particular method of analyzing results (e.g., intention to treat vs. per protocol). The goal is simply to ensure transparency in what was done. Results should enable other observers to make their own judgments about the validity of the reported analysis.

USING CLINICALTRIALS.GOV DATA Intended Audience The general public is ultimately the beneficiary of the ClinicalTrials.gov system, but different parts of

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

121

LOOKING FORWARD

ClinicalTrials.gov are likely to be of more or less interest to different audiences. For example, although summary protocol and eligibility information may be of use to members of the general public (and their advisors), results information in its current form may be of greatest interest to researchers and systematic reviewers, who understand the strengths and limitations of summary results from individual trials. Researchers may find the data posted in ClinicalTrials.gov to be useful when planning their own research projects. For example, a researcher may want to know what other trials have been completed (whether results have been published) or are ongoing that use the same intervention or patient population. Similarly, IRB members (and investigators) may want to ensure that they are aware of all prior relevant research, so that their assessment of risks and benefits for a proposed clinical study is as complete as possible. Researchers, systematic reviewers, and policy-makers use data from the ClinicalTrials.gov database to explore various aspects of the overall clinical research enterprise. For example, the registry and results database have been used to assess issues such as time to publication among completed registered trials,45 percentage of registered trials subject to federal human research oversight regulation,46 reasons for premature termination of trials,47 and discrepancies between ClinicalTrials.gov results database entries48,49 and other sources including publications and documents publicly available on the [email protected] website.

Search Tips for ClinicalTrials.gov ClinicalTrials.gov provides a basic free text search of all registered studies. Advanced search options allow a user to conduct a more granular search that takes advantage of the database structure. All searches make use of the Essie concept-based search engine.50 For example, the search engine incorporates a bank of synonyms, so that a search for “paxil” also will find trials that use any known synonym (e.g., paroxetine, brl-29060, fg-7051, Seroxat); similarly, a search for “heart attack” will find trials that use the term “myocardial infarction.” Overall, the ClinicalTrials.gov website provides users with a flexible interface that accommodates searches of varying complexity for users with varying needs and levels of skill/experience. ClinicalTrials.gov allows for downloading of the data from retrieved studies or the full data set for analysis by data scientists. The Clinical Trials Transformation Initiative (CTTI) group also provides an “analysis-ready” data set derived from

ClinicalTrials.gov data that is intended to make it easier for researchers to assess the data.51

Points to Consider When Using ClinicalTrials. gov to Study the Overall Clinical Research Enterprise Although ClinicalTrials.gov is by far the largest source of information about ongoing and completed clinical trials, it does not include all studies. The data set will reflect prevailing policies, both in the United States and internationally. As these policies have changed over time, any sampling biases have likely changed as well. In general, the data set is likely to be most complete for drug and device trials sponsored by multinational or US companies. NIH-funded studies of drugs and devices are also likely to be well represented. However, care should be taken before making inferences about the overall worldwide population of clinical trials.

LOOKING FORWARD The Institute of Medicine52 and the ICMJE53 and others have called for greater sharing of individual participant data (IPD) for secondary analysis. The trialtransparency movement argues that providing greater third-party access to IPD, along with the protocol and other associated documents, would provide insights into decisions underlying the original analysis and provide accountability and allow for exploration of novel research questions.54 We believe that this new frontier in trial transparency can best be understood as part of an overall three-level trial reporting system (TRS) framework (Fig. 9.4).55 In this construct, registration and results reporting serve as the base of the TRS. The sharing of IPD and related documents sits atop these foundational layers. Without an overview of the clinical research landscape that the base provides, availability of IPD for certain trials but not others would result in another type of selective publication or reporting bias. In Table 9.6 we illustrate how registration, results reporting, and IPD sharing complement each other in increasing transparency, in the context of a case study. Thus, while systematic sharing of IPD is most certainly the newest frontier in trial transparency, this one component should not divert attention and/or resources from the ongoing need for more accurate, complete, and consistent registration and results reporting.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

122

9. TRIAL REGISTRIES AND RESULTS DATABASES

FIGURE 9.4 Schematic depicting the functions of the three key components of the trial reporting system. Reprinted under a CC-0 public domain dedication from Zarin DA, Tse T. Sharing individual participant data (IPD) within the context of the trial reporting system (TRS). PLoS Med January 2016; 13(1):e1001946.

TABLE 9.6

Key Issues With Trials of Antidepressant Use in Children for Depression and the Role of the trial reporting system (TRS)

Key Issue

Relevant TRS Component

Comment

Lack of prospective public information about all trials of Paxil and other selective serotonin reuptake inhibitors (SSRIs) in depressed children

Prospective registration

Registration would have provided a public list of all ongoing and completed trials of Paxil/SSRIs in depressed children

Alleged suppression of “negative” results from certain Paxil trials in depressed children62

Prospective registration

Registration would have allowed the detection of trials without disclosed results

Summary results reporting

Results database entries would have provided access to “minimum reporting set” including all prespecified outcome measures and all serious adverse events

Prospective registration

Archival registration information would have allowed for the detection of unacknowledged changes in prespecified outcome measures and detection of nonprespecified outcome measures reported as statistically significant

Summary results reporting

Structured reporting devoid of interpretation or conclusions would have made summary data publicly available while avoiding the possibility of spinning the results

Sharing highly granular individual participant data (IPD) and documents (e.g., case report forms)

Access to high-granularity IPD enabled the elucidation of data analytic decisions that had not been publicly disclosed; reanalysis was possible with different methods of categorizing adverse events

Detection of selective reporting bias of efficacy and safety findings in the published results of Study 329, unacknowledged changes in outcome measures, and other issues63

Invalid and unacknowledged categorization of certain adverse events, resulting in the underreporting of suicidality64

Reprinted under a CC-0 public domain dedication from Zarin DA, Tse T. Sharing individual participant data (IPD) within the context of the trial reporting system (TRS). PLoS Med January 2016;13(1):e1001946.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REFERENCES

CONCLUSION Because of a series of policy and legal actions, clinical trial registration has become standard practice internationally. Public reporting of summary results represents a new and evolving area. As clinical trial disclosure requirements continue to evolve, novel uses of data from the registry and results database will become clearer as more information is posted publicly. Ultimately, however, the quality and accuracydand thus utilitydof the registry and results database depend on the diligence and integrity of trial sponsors and investigators.

SUMMARY/DISCUSSION QUESTIONS 1. Which of the following is a rationale for registering and reporting results of clinical trials? a. Mitigating the effects of selective reporting and publication bias b. Fulfilling ethical principles underlying human research c. Facilitating assessment of research integrity, such as tracking protocol changes d. All of the above 2. The ClinicalTrials.gov Results Database consists of a. A repository of deidentified patient-level data for certain registered clinical trials b. Narrative abstracts from publications reporting clinical trial results c. Summary data displayed in a tabular format for certain registered trials d. Full-text articles submitted to ClinicalTrials.gov 3. Which of the following entities has not required the reporting of trial results to an online database? a. Medical journal editors b. US federal government c. State attorneys general d. The EU 4. ClinicalTrials.gov submissions are not a. Reviewed by both automated validation checks and human experts b. Provided through either interactive data entry or file upload c. Systematically verified against external, objective data sources d. Assigned a unique identifier that may be used to track that particular trial 5. When ClinicalTrials.gov data are used to perform aggregate analyses of a large sample of clinical trials, which caveat should be heeded by researchers? a. The database is not comprehensive: it does not include all clinical trials. b. The database is static: records do not need to be updated after registration.

123

c. The database is not comprehensive: it includes only recruiting clinical trials. d. The database is not comprehensive: it includes only drug and device clinical trials.

References 1. Zarin DA, Tse T. Medicine. Moving toward transparency of clinical trials. Science March 7, 2008;319(5868):1340e2. 2. Simes RJ. Publication bias: the case for an international registry of clinical trials. J Clin Oncol October 1986;4(10):1529e41. 3. Emanuel EJ, Wendler D, Grady C. What makes clinical research ethical? JAMA May 24e31, 2000;283(20):2701e11. 4. Levin LA, Palmer JG. Institutional review boards should require clinical trial registration. Arch Intern Med August 13e27, 2007; 167(15):1576e80. 5. Mann H. Research ethics committees and public dissemination of clinical trial results. Lancet August 3, 2002;360(9330):406e8. 6. World Medical Association. WMA declaration of Helsinki e ethical principles for medical research involving human subjects. 64th WMA general assembly, Fortaleza, Brazil. 2013. Available at: http:// www.wma.net/en/30publications/10policies/b3/. 7. Health omnibus programs extension act of 1988. Public law 100e607. 8. Katz DG, Dutcher GA, Toigo TA, Bates R, Temple F, Cadden CG. The AIDS Clinical Trials Information Service (ACTIS): a decade of providing clinical trials information. Public Health Rep MareApr 2002;117(2):123e30. 9. Grama LM, Beckwith M, Bittinger W, et al. The role of user input in shaping online information from the National Cancer Institute. J Med Internet Res July 1, 2005;7(3):e25. 10. Food and drug administration modernization act of 1997. Public law 105e115. 11. De Angelis C, Drazen JM, Frizelle FA, et al. Clinical trial registration: a statement from the international committee of medical journal editors. Ann Intern Med September 21, 2004; 141(6):477e8. 12. Laine C, De Angelis C, Delamothe T, et al. Clinical trial registration: looking back and moving ahead. Ann Intern Med August 21, 2007; 147(4):275e7. 13. Maine State Law, 22 MRSA c605, x2700-A An act regarding advertising by drug manufacturers and disclosure of clinical trials. 2005. Repealed in 2011 by H.P. 530 e L.D. 719. 14. Rennie D. Trial registration: a great idea switches from ignored to irresistible. JAMA September 15, 2004;292(11):1359e62. 15. Food and drug administration amendments act of 2007. Public law 110e185. 16. Notice of proposed rulemaking: clinical trials registration and results submission. Fed Regist November 21, 2014;79:69566e680. https://federalregister.gov/a/2014-26197. 17. Zarin DA, Tse T, Sheehan J. The proposed rule for U.S. clinical trial registration and results submission. New Engl J Med January 8, 2015;372(2):174e80. 18. National Institutes of Health. Request for public comments on the draft nih policy on dissemination of nih-funded clinical trial information. Bethesda, MD: NIH, Department of Health and Human Services; November 19, 2014. Available at: http://grants.nih.gov/grants/ guide/notice-files/NOT-OD-15-019.html. 19. Clinical trials registration and results submission. Docket ID: NIH-2011-0003. Available at: https://www.regulations.gov/#! docketDetail;D¼NIH-2011-0003. 20. Zarin DA, Tse T, Ross JS. Trial-results reporting and academic medical centers. New Engl J Med June 11, 2015;372(24):2371e2.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

124

9. TRIAL REGISTRIES AND RESULTS DATABASES

21. Centers for Medicare, Medicaid Services. Guidance for the public, industry, and CMS staff: coverage with evidence development. November 20, 2014. Available at: https://www.cms.gov/ medicare-coverage-database/details/medicare-coveragedocument-details.aspx?MCDId¼27. 22. Department of Veterans Affairs. ORD sponsored clinical trials: registration and submission of summary results. 2015. Available at: http:// www.research.va.gov/resources/ORD_Admin/clinical_trials/. 23. Patient-Centered Outcomes Research Institute. PCORI’s process for peer review of primary research and public release of research findings. February 24, 2015. Available at: http://www.pcori.org/sites/ default/files/PCORI-Peer-Review-and-Release-of-FindingsProcess.pdf. 24. Oregon Department of Justice. Attorney general’s medicaid fraud unit settles medicaid rebate cases with Merck. February 7, 2008. Available at: http://www.doj.state.or.us/releases/pages/2008/rel020808.aspx. 25. Office of Inspector General, U.S. Department of Health, Human Services. Johnson & Johnson Corporate integrity agreement. October 31, 2013. Available at: http://oig.hhs.gov/fraud/cia/agreements/ Johnson_Johnson_10312013.pdf. 26. Ghersi D, Pang T. From Mexico to Mali: four years in the history of clinical trial registration. J Evidence-Based Med February 2009;2(1): 1e7. 27. World Health Organization. Trial registration data set (version 1.2.1). 2011. Available at: http://www.who.int/ictrp/network/trds/en/ index.html. 28. World Health Organization. WHO statement on public disclosure of clinical trial results. April 14, 2015. Available at: http://www.who. int/ictrp/results/reporting/en/. 29. Moorthy VS, Karam G, Vannice KS, Kieny MP. Rationale for WHO’s new position calling for prompt reporting and public disclosure of interventional clinical trial results. PLoS Med April 2015;12(4):e1001819. 30. World Health Organization. International clinical trials registry platform (ICTRP): about the ICTRP search portal. 2011. Available at: http://www.who.int/ictrp/search/en/. 31. van Valkenhoef G, Loane RF, Zarin DA. Previously unidentified duplicate registrations of clinical trials: an exploratory analysis of registry data worldwide. Syst Reviews 2016;5(1):116. 32. European Commission. Directive 2001/20/EC of the European Parliament and of the Council of 4 April 2001 on the approximation of the laws, regulations and administrative provisions of the member states relating to the implementation of good clinical practice in the conduct of clinical trials on medicinal products for human use. Off J Eur Communities April 4, 2001;L121:34e44. 33. European Commission. Communication from the commission regarding the guideline on the data fields contained in the clinical trials database provided for in article 11 of directive 2001/20/EC to be included in the database on medicinal products provided for in article 57 of regulation (EC) No 726/2004. Off J Eur Union July 3, 2008;C168:3e4. 34. European Commission. Communication from the commission d guidance on the information concerning paediatric clinical trials to be entered into the EU Database on Clinical Trials (EudraCT) and on the information to be made public by the European Medicines Agency (EMEA), in accordance with article 41 of regulation (EC) No 1901/2006. Off J Eur Union February 4, 2009;C 28:1e4. 35. European Medicines Agency. European medicines agency launches a new version of EudraCT: summary results of clinical trials soon to be available to the public. October 11, 2013. Available at: http://www. ema.europa.eu/ema/index.jsp?curl¼pages%2Fnews_and_events %2Fnews%2F2013%2F10%2Fnews_detail_001918.jsp. 36. European Commission. Regulation (EU) No 536/2014 of the European Parliament and of the council of 16 April 2014 on clinical trials on medicinal products for human use, and repealing directive 2001/20/EC. Off J Eur Communities May 27, 2014;L158:1e76.

37. Tse T, Williams RJ, Zarin DA. Update on registration of clinical trials in ClinicalTrials.gov. Chest July 2009;136(1):304e5. 38. Zarin DA, Tse T. Unambiguous identification of obesity trials. New Engl J Med February 7, 2013;368(6):580e1. 39. National Library of Medicine. ClinicalTrials.gov glossary of common site terms. 2012. Available at: https://clinicaltrials.gov/ct2/aboutstudies/glossary. 40. Zarin DA, Tse T. Trust but verify: trial registration and determining fidelity to the protocol. Ann Intern Med July 2, 2013;159(1):65e7. 41. Tse T, Williams RJ, Zarin DA. Reporting “basic results” in ClinicalTrials.gov. Chest July 2009;136(1):295e303. 42. Schulz KF, Altman DG, Moher D, Group C. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med June 1, 2010;152(11):726e32. 43. ICH Harmonised Tripartite Guideline E3: Structure and Content of Study Reports. 30 International conference on harmonisation of technical requirements for registration of pharmaceuticals for human use. November 1995. 44. Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results databaseeupdate and key issues. New Engl J Med March 3, 2011;364(9):852e60. 45. Ross JS, Mocanu M, Lampropulos JF, Tse T, Krumholz HM. Time to publication among completed clinical trials. JAMA Intern Med May 13, 2013;173(9):825e8. 46. Zarin DA, Tse T, Menikoff J. Federal human research oversight of clinical trials in the United States. JAMA March 5, 2014;311(9): 960e1. 47. Williams RJ, Tse T, DiPiazza K, Zarin DA. Terminated trials in the ClinicalTrials.gov results database: evaluation of availability of primary outcome data and reasons for termination. PLoS One 2015;10(5):e0127242. 48. Becker JE, Krumholz HM, Ben-Josef G, Ross JS. Reporting of results in ClinicalTrials.gov and high-impact journals. JAMA March 12, 2014;311(10):1063e5. 49. Hartung DM, Zarin DA, Guise JM, McDonagh M, Paynter R, Helfand M. Reporting discrepancies between the ClinicalTrials.gov results database and peer-reviewed publications. Ann Intern Med April 1, 2014;160(7):477e83. 50. Ide NC, Loane RF, Demner-Fushman D. Essie: a concept-based search engine for structured biomedical text. J Am Med Inform Assoc JAMIA MayeJun 2007;14(3):253e63. 51. Clinical Trials Transformation Initiative. State of clinical trials: AACT database. 2016. Available at: http://www.ctti-clinicaltrials.org/ what-we-do/analysis-dissemination/state-clinical-trials/aactdatabase. 52. Institute of Medicine. Sharing clinical trial data: maximizing benefits, minimizing risks. Washington DC: The National Academies Press; January 14, 2015. 53. Taichman DB, Backus J, Baethge C, et al. Sharing clinical trial data: a proposal from the international committee of medical journal editors. Ann Intern Med April 5, 2016;164(7):505e6. 54. Zarin DA. Participant-level data and the new frontier in trial transparency. New Engl J Med August 1, 2013;369(5):468e9. 55. Zarin DA, Tse T. Sharing individual participant data (IPD) within the context of the trial reporting system (TRS). PLoS Med January 2016;13(1):e1001946. 56. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. New Engl J Med January 17, 2008;358(3): 252e60. 57. Lancet. Is GSK guilty of fraud? Lancet June 12, 2004;363(9425):1919. 58. Juni P, Rutjes AW, Dieppe PA. Are selective COX 2 inhibitors superior to traditional non steroidal anti-inflammatory drugs? BMJ June 1, 2002;324(7349):1287e8. 59. Krumholz HM, Ross JS, Presler AH, Egilman DS. What have we learnt from Vioxx? BMJ January 20, 2007;334(7585):120e3.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

REFERENCES

60. Mitka M. Controversies surround heart drug study: questions about Vytorin and trial sponsors’ conduct. JAMA February 27, 2008;299(8):885e7. 61. Vedula SS, Bero L, Scherer RW, Dickersin K. Outcome reporting in industry-sponsored trials of gabapentin for off-label use. New Engl J Med November 12, 2009;361(20):1963e71. 62. GSK assurance of discontinuance. August 26, 2004. Available at: http://www.ag.ny.gov/sites/default/files/press-releases/ archived/aug26a_04_attach2.pdf.

125

63. Jureidini JN, McHenry LB, Mansfield PR. Clinical trials and drug promotion: selective reporting of study 329. Int J Risk Saf Med 2008;20:76e81. 64. Leslie LK, Newman TB, Chesney PJ, Perrin JM. The food and drug administration’s deliberations on antidepressant use in pediatric patients. Pediatrics July 2005;116(1):195e204.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

C H A P T E R

10 Data and Safety Monitoring 1

Paul G. Wakim1, Pamela A. Shaw2

National Institutes of Health, Bethesda, MD, United States; 2University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States

O U T L I N E Why Monitor?

127

Who Monitors? Data and Safety Monitoring Board History of Data and Safety Monitoring Boards When Is a Data and Safety Monitoring Board Needed?

128 128

What to Monitor? Monitoring Participant Safety Monitoring Trial Conduct Participant Flow Participants’ Baseline Characteristics Randomization Outcome Regulatory Compliance Trial Performance Data Quality

130 130 131 131 131 131 132 132 133

133 134 135

128

When and How Often to Monitor?

136

129

Special Topics General Structure of Data and Safety Monitoring Board Meetings Masking of the Data and Safety Monitoring Board

137

Summary

138

Summary Questions

139

Acknowledgments

139

References

139

WHY MONITOR? The primary reason for monitoring a clinical trial is to ensure that it does not compromise the safety of its participants. Increased risk of participation could be a result of the experimental treatment(s), control treatment, or any other trial-related procedure. Investigators’ legal and ethical responsibilities are to ensure that the participants in their trial are not subjected to unnecessary physical harm and/or suboptimal care, that is, providing a treatment to some or all participants when there is evidence of better treatments. This evidence could either come internally, from accumulated study data, or externally from clinical trials conducted by other

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00010-1

Interim Analyses Sample Size Recalculation Interim Analyses for Efficacy, Futility, and/or Harm

137 137

groups. Unnecessary risk also can result from continuing the trial when there is little chance of gaining additional scientific information. In situations like these, stopping the trial may be the only course of action. The second most important reason for trial monitoring is to ensure data integrity. At regular time points during the trial, investigators and sponsors should ask themselves the following questions: when the trial is completed, will the quality of the data collected be high enough to produce meaningful results? Will the data analysis be able to answer the primary research question with reasonable accuracy and minimal bias? From a data validity perspective, is continuing the trial worth the additional effort and expense? Threats to data integrity

127

Copyright © 2018. Published by Elsevier Inc.

128

10. DATA AND SAFETY MONITORING

could result from very slow enrollment, poor randomization process, inaccurate data entry, inadequate masking mechanism, or extensive missing data due to missed visits or loss to follow-up. Another important reason to monitor a trial is to validly answer the primary research question with the minimum amount of resources. For example, if it is found, while the trial is ongoing, that it can produce valid results with a smaller sample size, recruiting the larger sample is unethical to the yet-to-be-randomized participants (exposing them to unnecessary risk) and wastes the sponsor’s time and money. It is also unethical to future patients who would have benefited earlier from possible positive trial results. One bonus benefit of vigilant and objective monitoring is that it increases the validity and credibility of the results in the eyes of the scientific community and regulatory agencies. In this chapter, we discuss key elements of data and safety monitoring (DSM), the frequency of the activity, and the groups responsible for monitoring the clinical trial, including the roles and responsibilities of the trial’s Data and Safety Monitoring Board (DSMB).

WHO MONITORS? Data and Safety Monitoring Board Because investigators are, by definition, very closely involved in their own clinical trial, and therefore may not be completely objective with regard to their own trial’s progress and performance, a group of independent experts is called to monitor the trial’s conduct. To effectively monitor the trial and determine whether it is ethical to continue, the external monitoring board frequently relies on examining unmasked (unblinded) data on safety and other study outcomes. The review of unmasked data needs to be done with utmost confidentiality and should not be shared with study investigators or the sponsor while the trial is ongoing. The research team’s knowledge of results accumulated so far may influence their decisions and behavior with participants, potentially biasing future outcomes and final results. This group of outside experts goes by several names. The most common names are: DSMB, Data and Safety Monitoring Committee (DSMC), and Data Monitoring Committee (DMC). Less common names are: External Safety Monitoring Committee (ESMC) and Treatment Effects Monitoring Committee (TEMC). DSMBs are typically comprised of an independent group of subject-matter experts in the disease setting and type of interventions being studied. For example, a DSMB for a study of a novel imaging tool to guide

treatment in breast cancer would include a breast cancer specialist and a radiologist. An ethicist and a biostatistician are other key members of the board. In many settings, it has become standard also to include a patient advocate. It is also helpful to have members with extensive DSMB experience, who can offer guidance on conventions and conduct of the meeting and who can help guide the board through what may be difficult or complex decisions about the trial. The roles and responsibilities of the DSMB are outlined in the DSMB charter, which lists board members and their affiliation, and identifies key roles, such as the DSMB chair, ethicist, and biostatistician (see for example, Ref. 1). The DSMB chair runs the meetings and ensures that the guidelines of the charter are followed. DSMB chairs are the main liaison between the board, trial sponsor, and trial investigators. They are responsible for communicating the DSMB recommendations to the trial sponsor at the end of each meeting. The charter also specifies the voting members of the DSMB. Voting members are independent of the trial leadership and sponsor because they may need to make difficult decisions about the trial, such as recommending closing a study arm, closing enrollment at a poor-performing site, or even stopping the trial altogether. Credibility of trial results could be put into question if there is a perceived conflict of interest for one or more DSMB members. For example, if a trial is stopped early for efficacy, a journal editor or regulatory body reviewing the main results could call into question this decision if it perceives that there were DSMB members or someone closely related to the member who would stand to financially gain from such an action. For this reason, DSMB members need to be free of any significant conflict of interest, such as financial holdings or professional relationship with a company that manufactures a product that could be impacted in some way by results of the study. Regularly during the study, DSMB members should be solicited to provide financial disclosures and other potential conflicts of interest to trial leadership. Keeping DSMB members independent and free of conflicts helps maintain the confidentiality of the trial, as relationships and/or regular contact with the sponsor or trial team may create an environment where confidential information about interim trial results could be accidentally revealed, or where DSMB members could be subject to pressure to disclose confidential information discussed at DSMB meetings.

History of Data and Safety Monitoring Boards The concept of an independent board overseeing the conduct of a randomized clinical trial surfaced in the 1950s. The US National Institutes of Health (NIH) was

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

WHO MONITORS?

probably the first organization to put the concept into practice, in the early 1960s, in large trials on improving survival from acute myocardial infarction.2,3 As early as 1962, one of the first NIH DSMBs (called Policy Board at the time) was established to act in a senior advisory capacity to the participating investigators of the Coronary Drug Project (CDP) of the National Heart Institute (currently the National Heart, Lung, and Blood Institute).4 A few years later, the same Policy Board recommended that CDP personnel no longer have access to study endpoints. It also recommended the formation of a Safety Monitoring Committee to review confidential safety data.4 In 1967, an advisory board to the National Heart Institute developed recommendations on the organization, review, and administration of cooperative studiesdlarge multisite clinical trials. These recommendations were published 20 years later in the influential Greenberg Report.5 One of the recommendations was a cooperative study organizational structure with a “Policy Board or Advisory Committee of senior scientists” who would “review the overall plan, make recommendations on any possible changes (including changes in protocol and operating procedures), adjudicate controversies that may develop, and advise the National Heart Institute on such matters as the addition of new participants or the dropping of nonproductive units.” Members of this Advisory Board should be “experts in the field of the study but not data-contributing participants in it”.5 Since then, it became more common for large randomized clinical trial sponsors to form independent committees to monitor the safety, data integrity, and ethics of their trials. Other US federal agencies started to establish independent boards for their trials: the National Eye Institute in the early 1970s; the Department of Veterans Affairs in the mid-1970s; the National Cancer Institute and National Institute of Allergy and Infectious Diseases in the early 1980s.2 In the early 1990s, the pharmaceutical industry started to use DSMBs, particularly in trials on cardiovascular diseases.2 However, despite the growing use of DSMBs, little was published on the operational aspects of trial monitoring and the functioning of monitoring committees. So in 1992, the NIH organized a workshop on practical issues in data monitoring of clinical trials, and in 1993, a whole issue of the journal Statistics in Medicine was dedicated to the workshop proceedings.6 The articles in that issue presented experiences from a wide variety of disciplines, industries, countries, and disease areas. They addressed ethical, logistical, and operational considerations in data monitoring and interim analyses. As of today, many books, journal articles, and guidance documents on DSMBs have been published. In addition to the references cited in the body of this

129

chapter, we list a few additional ones in the Summary section, at the end of this chapter.

When Is a Data and Safety Monitoring Board Needed? Clinical trialists, regulatory agencies, and sponsoring organizations would all agree that (1) every clinical trial needs oversight and monitoring and (2) not every clinical trial needs an independent DSMB. But then, how do investigators determine whether they should have a DSMB for their trial? Current guidelines are vague and general, perhaps rightly so, since it is difficult to classify clinical trials generically into those that do and those that do not need a DSMB, without knowing the details of each trial. A World Health Organization (WHO) report states that a DSMB is considered “relevant” in studies that focus on mortality and/or severe morbidity; involve high-risk interventions; test novel interventions with potential serious adverse outcomes; of long duration; where interim analyses could justify early termination; in emergency situations; and those that involve vulnerable populations.7 The report then states that not all studies that fall in these categories necessarily require a DSMB and that there may be trials that do not fall in any of these categories that may still need a DSMB. Similar guidelines were developed by the European Medicines Agency (EMA), which give examples of when a DSMB should be set up and when it should not.8 For example, the guidelines state that “clinical studies in non-critical indications where patients are treated for a relatively short time and the drugs under investigation are well characterized and known for not harming patients” might not need a DSMB. The EMA adds that with such trials, a DSMB may even be counterproductive because the additional preparations for the DSMB may delay the closure of the trial.8 US federal government agencies also have developed their own guidelines (e.g., Refs. 3,9e11). Others also have published recommendations on the need for a DSMB. For example, the Society for Clinical Trials Working Group on Data Monitoring developed guidelines for DSM of early-phase (Phase I and II) trials that do not require an outside, independent monitoring board12; and Ellenberg et al.2 list four general criteria for determining the need and value of a DSMB. The gist of these guidelines is fairly similar, namely, that a DSMB is needed when the trial (1) involves interventions with relatively high or unknown risks (e.g., gene therapy in advanced stage cancer patients); (2) concerns a disease/condition that has serious health implications (e.g., coronary heart disease); (3) is conducted on a fragile or vulnerable population (e.g., pregnant

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

130

10. DATA AND SAFETY MONITORING

women, the elderly in nursing homes, and children); (4) is large enough (i.e., of long duration and/or costly) that it can be stopped early; or, (5) is controversial (e.g., showing that an existing standard of care that has been used for many years is not better than placebo, or perhaps even inferior to placebo when considering its side effects). Since late-phase (Phase III and IV) and multicenter clinical trials generally fall under at least the fourth category, they typically need independent monitoring. Trials that do not need DSMBs are earlyphase (Phase I and II), single-site, open-label, low-risk trials. For trials that fall between these two ends of the spectrum, investigators should consult with the regulatory agencies and sponsoring organization to discuss each specific case.

Interim analyses examine data on clinical or biological outcomes collected so far. They are conducted while the trial is ongoing. They are particularly useful when there is potential to appreciably shorten the duration of a lengthy trial. Interim analyses can be divided into two types: (1) sample size recalculation and (2) interim analyses for efficacy, futility, and/or harm. The next three sections address each monitoring activity in more detail. Although participants’ safety is always most important, it may not be the primary clinical outcome of interest. For example, if it has been shown in previous studies that a medication is generally safe, the primary research question may focus instead on whether the medication reduces blood pressuredthe primary research outcome of interest. When a riskebenefit balance is not clear, both efficacy and safety outcomes should be monitored during an interim analysis.

WHAT TO MONITOR? There are three main components of clinical trial monitoring: (1) participant safety; (2) trial conduct; and (3) interim analyses (see Fig. 10.1). Monitoring participant safety examines whether there are safety concerns. It generally includes consideration of adverse events (AEs) that occur during the trial, with a key focus on frequency and occurrence of unexpected events, as well as any severe events. Monitoring trial conduct assesses whether the trial is being successfully carried out as planned. It typically evaluates (1) participant flow; (2) participants’ baseline characteristics; (3) randomization outcome; (4) regulatory compliance; (5) trial performance; and (6) data quality.

Monitoring Participant Safety Again, first and foremost, investigators should be concerned about the safety and well-being of their trial’s participants. The more risky the trial interventions, the more scrutinized safety monitoring should be. AE is the most commonly used measure of safety. The NIH defines AEs as any “untoward medical occurrence in a human subject, including any abnormal sign (for example, abnormal physical exam or laboratory finding), symptom, or disease, temporally associated with the subject’s participation in research, whether or not considered related to the subject’s participation in the research.”13 The NIH also specifies that some AEs are considered serious adverse events (SAEs) if they

FIGURE 10.1 Most common clinical trial monitoring activities.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

WHAT TO MONITOR?

result in death; are life-threatening; require either inpatient hospitalization or the prolongation of hospitalization; result in a persistent or significant disability/ incapacity; or result in a congenital anomaly/birth defect. Based on appropriate medical judgment, other important medical events also may be considered SAEs if an intervention on a trial participant is required to prevent one of the outcomes mentioned above.13 The International Conference on Harmonisation (ICH) defines an SAE in similar terms.14 For each AE/SAE occurrence, the following information is typically reported: participant identification code number, AE/SAE description, start date, severity, relationship to trial, outcome, and resolution date (when available). Since many treatments have expected and acceptable off-target effects, such as a standard cancer therapy causing nausea, pain, or hair loss, AEs/SAEs need to be discussed in terms of whether or not they are expected or unexpected. The severity of the disease being studied is a factor when considering the acceptability of the AE. For instance, for a life-threatening condition, such as an advanced-stage cancer, an increased occurrence of a treatable AE, such as nausea or pain may be acceptable, whereas in another setting, such as a new antihistamine given to healthy people for a mild allergy, it may not. This can be a complex evaluation, and one that deserves robust discussion by an interdisciplinary committee in terms of what is and is not acceptable. The issue of unanticipated risk is given further discussion in Food and Drug Administration (FDA)15 and in Chapter 11 (Unanticipated Risk in Clinical Research).

Monitoring Trial Conduct This activity focuses on the general conduct of the trial and not on the research results of the trial. It is performed without any look at data related to the trial’s endpoints or research outcomes. What follows is a list of typical trial information that helps assess how well the trial is proceeding. Participant Flow Participant flow, sometimes called participants’ disposition, is the procedural flow of individuals going from being approached to participate in the clinical research study, to being prescreened, then screened for inclusion/exclusion criteria, consented, randomized, receiving and completing the corresponding treatment, staying in the trial for the whole duration of the trial, and finally being included in the primary analysis. Typically, participant flow is a visual explanation of how the large number of individuals who were approached becomes the much smaller number of

131

participants who are included in the primary analysis. This is important because it shows whether the declining number of individuals at every trial stage from beginning to end reflects realistic expectations in a clinical trial on such a disease and target population. It is used to flag any potential selection biases in the group of participants who are eventually analyzed. It also indicates whether the inclusion/exclusion criteria are too stringent (i.e., including in the trial only a narrow subset of the population of interest). In summary, it gives a sense of the representativeness of the group of participants who will be included in the final primary analysis. For visual examples of participant flow diagrams, the reader is referred to guidelines from the ICH16 and Moher et al.17 Participants’ Baseline Characteristics Baseline characteristics represent information on study participants’ important traits collected before randomization. Examples of baseline characteristics include demographics (e.g., gender, race, ethnicity, age, socioeconomic status, education), basic body parameters (e.g., height, weight, body mass index), vital signs (e.g., blood pressure), and disease-related information (e.g., severity, onset). The objective of this exercise is to assess whether the participants recruited so far are representatives of the target population described in the protocol and to ensure that the trial results can be generalized to the broader target population. Randomization Outcome In addition to establishing causality by attributing differences between treatment groups to efficacy, randomization is performed in clinical trials to achieve balance in known and unknown factors that could influence the primary response. The question here is whether randomization has been properly conducted and whether the results of randomization match the intended randomization strategies. One way to address it is to examine the profiles of study participants who have been randomized so far to each intervention group, including their stratification factors. For example, if gender is a stratification factor, there should be roughly an equal number of women in each intervention group, and similarly for men. Other baseline characteristics also should be balanced across intervention groups. Although statistical tests and the reporting of corresponding P-values to compare baseline characteristics between intervention groups are to be avoided after the trial is completed,18,19 they can be used while the trial is proceeding to check whether randomization is properly conducted.20 One should keep in mind though that if a relatively large number of baseline characteristics are being examined, one can expect by chance a few imbalances between treatment groups, typically

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

132

10. DATA AND SAFETY MONITORING

for about 5% of the baseline characteristics. However, repeated imbalances over the course of the trial, and patterns of imbalances should be flagged and investigated further. Regulatory Compliance Complying with regulatory laws, rules and regulations is not a trivial undertaking. Keeping up-to-date with all regulatory issues and ensuring that they are followed and implemented can be daunting. Reporting deadlines, specially related to reporting AEs and SAEs, need to be met. In addition to regulatory safety reporting requirements, here are some examples of specific regulatory compliance items to monitor: Institutional Review Board (IRB) approval/renewal; Federalwide Assurance number and expiration date; FDA Investigational New Drug (IND) application; and any other significant regulatory issues. Ensuring regulatory compliance is ultimately the responsibility of the sponsor. In this book, Chapters 4 (Institutional Review Boards), 6 (The Regulation of Drugs and Biological Products by the Food and Drug Administration), and 12 (Legal Issues) cover different aspects of this topic in more detail. Trial Performance What follows is a list of key trial performance criteria. Protocol Compliance by Research Staff In general, deviations from, and violations of, operating procedures described in the protocol should be documented and reported to the IRB, DSMB, and applicable regulatory agencies. Such violations could include

improper informed consent procedures, inclusion/ exclusion criteria not met, visits conducted outside the permissible time window, or inadequate record keeping. Clinicians’ adherence to treatment (treatment fidelity) is another example of protocol compliance. The question is whether research staff involved in providing treatment to participants are following the procedures and approaches described in the protocol. This is particularly germane with clinical trials assessing the efficacy of psychosocial therapies. Although quantifying clinicians’ treatment adherence may be challenging, it is important to capture and report that information when it is directly related to data integrity as defined above. One approach is to videotape a sample of psychosocial treatment sessions and evaluate their conformity to the protocol. Recruitment Is the trial recruiting at the expected pace specified in the protocol? If not, why not? Are there any actions the investigator or sponsor can take to resolve recruitment issues? Recruitment performance can be graphed as in Fig. 10.2 by showing the actual and expected number of randomizations over time. An accompanying table can show the expected and actual number of randomizations. Actual divided by expected number of randomizations (expressed in percent) is a quick way to check on recruitment, particularly if the trial involves multiple sites. In such cases, the graph and table can be presented for each site, as well as overall. In trials where a sufficient number of individuals are being screened, but a higher than expected proportion are found to be

FIGURE 10.2 Monitoring recruitment.

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

WHAT TO MONITOR?

ineligible, presenting additional graphs for the number screened and the number randomized can be informative with respect to whether the recruitment problem is related to finding potentially eligible and interested participants, or that the eligibility criteria are difficult to satisfy. Participants’ Treatment Adherence (Treatment Exposure) It is equally important that participants get the treatment they are assigned to receive, for example, taking the medication and/or attending counseling sessions as often as described in the protocol. Otherwise, the treatments being compared at the end of the trial are no longer the ones intended, and the results no longer reflect the primary objective of the trial. One way to monitor treatment exposure is to quantify the expected treatment that each participant should receive and the treatment actually received. For example, based on the number of participants randomized, one can calculate the number of medication pills expected to have been taken by all participants combined, or the number of psychosocial therapy sessions expected to have been attended by all participants combined. One also can calculate the number of medication pills actually taken, or number of sessions actually attended. The “actual” divided by the “expected,” expressed in percent, gives a general sense of treatment adherence. Data Completeness (Availability of Primary and Other Key Endpoints) This is not about revealing the value of the primary or other key endpoints, whether for each participant or in aggregate. It is about determining whether the endpoint data are collected and entered in the database, that is, the extent of “missingness” of the primary and other key outcomes of interest. The ratio of the number of primary outcome values actually collected (nonmissing) at a particular point in time to the number of primary outcome values expected to be collected at the same point in time, expressed in percent, is one way to quantify the extent of “nonmissingness.” It represents the amount of data that will be available for the primary analyses. The higher the extent of “missingness,” the less reliable the final results. Indeed, the main advantage of randomizationdinference on causalitydis compromised when the database includes too many missing primary endpoints. Outcomes that are not of primary interest but are burdensome to collect (e.g., biopsy samples for a nonprimary biomarker endpoint) also are worth monitoring for completeness to justify the excess burden on participants. A sponsor also may wish to have expensive endpoints monitored to assure that they are collected in a manner that will yield meaningful scientific results.

133

Attendance at Follow-Up Visits (Retention) Results from follow-up visits are meaningless, and resources are wasted, if few participants come back for follow-up visits. The number of follow-up visits actually attended by all participants randomized so far, and who have reached that stage of the clinical trial, divided by the number of follow-up visits expected to have been attended at this point in time, expressed in percent, is one way of quantifying retention. If retention rates are worrisome, the DSMB makes suggestions for strategies to improve retention, or more often, asks the research team to come up with a plan to try to improve retention. Data Quality Data captured in the trial’s database need to be valid and accurate. Investigators need to monitor the quality of the data on a regular basis to ensure that data entry is performed as accurately as possible and that data from other sources (e.g., laboratories) are transferred to the study database without errors. An error rate (total number of discrepancies divided by the total number of fields audited), expressed in percent, is one way of quantifying and monitoring data quality. Flags and Triggers Some of the trial conduct indicators listed above can be summarized in color-coded tables. For example, the criteria shown in Fig. 10.3 may be used to color-code the performance of the trial overall and that of each participating clinical site. This gives a quick visual and objective way of identifying good- and poorperforming sites in a multicenter clinical trial. The thresholds indicated in the figure are only examples for illustration purposes. They vary according to the research field, treatments being tested, and trial phase (IeIV). In summary, monitoring trial conduct is an exercise that examines descriptive statistics related to trial performance and not participant outcomes or trial results.

Interim Analyses Interim analysis typically refers to a statistical analysis of the primary endpoint performed while the trial is proceeding. The values of the primary outcome (as opposed to whether they are missing) are usually the focus of primary analyses. They may be used to calculate the treatment effect (the difference between the effects of the experimental and control treatments), or some other key statistical parameters such as variability. Trial designs that include interim analyses fall under the umbrella of “adaptive designs” because the final trial design depends on the results of interim looks at the data (Ref. 21; see also Chapter 27, Intermediate Topics

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

134

10. DATA AND SAFETY MONITORING

FIGURE 10.3 Flags and triggers.

in Biostatistics). As is the case with all adaptive designs, interim analyses are prespecified in the protocol. The most commonly used interim analyses can be grouped into two categories: (1) sample size recalculation (or reestimation) and (2) interim analyses of participant outcomes to test for efficacy, futility, and/or harm. These two types are addressed in the next two sections. The result of an interim analysis is only one of several pieces of information that is considered in deciding how the trial should continue or whether it should be stopped.18 For example, Anand, Wittes, and Yusuf22 have argued that one may want to continue a trial even after an interim analysis shows futility, in order to collect as much safety information as possible, and to produce robust safety estimates. The number of AEs may be significantly different between two treatment conditions with similar effects, in which case the treatment with the higher AEs is to be avoided since it has no benefit and only more safety concerns. Sample Size Recalculation In statistics, a nuisance parameter is a parameter whose value is not of particular interest but that does affect the distribution of other parameters that are of interest. For example, investigators are not directly interested in the variance of the primary endpoint; but the magnitude of the variance does affect the results of the study, such as the width of the confidence interval and the corresponding P-value. Other examples of nuisance parameters, say in a setting where the mean response is of interest, are correlation between responses within a cluster, drop-out rate, and proportion of events in the overall trial cohort.

During the protocol development phase, values for the minimum treatment effect to be detected and for nuisance parameters are needed for power and sample size calculations. They are typically based on educated guesses. It is therefore advisable to assess at some point during the trial whether these guesses were realistic. Sample size recalculation is about revisiting the initial sample size that was determined at the trial design stage before the trial started. Sample size recalculation is generally planned upfront and described in the protocol. It is timed so that the sample size can be changed before recruitment is completed. If the initial sample size is small, or if recruitment is fast, the initial sample size may be reached by the time a new sample size is calculated and considered. For example, if the plan is to recalculate the sample size at 50% recruitment, it is important to estimate where actual recruitment will be by the time a final decision about the new sample size is made. Starting and stopping recruitment can be problematic (e.g., staffing issues, lack of consistency in type of patient recruited); so it is important to consider a sample size recalculation with enough time to implement such a change without any interruption in enrollment, or before enrollment exceeds the newly calculated sample size. Sample size recalculation can be divided into two distinct approaches: one that involves nuisance parameters only and one that also involves an estimate of the treatment effect.21 Each approach is discussed next. Sample Size Recalculation Based Only on Nuisance Parameters The question is whether the values of variance, correlation, drop-out rate, or proportion of events in the

I. ETHICAL, REGULATORY, AND LEGAL ISSUES

WHAT TO MONITOR?

control group, which were assumed at the beginning of the trial, are consistent with what is actually observed so far; and consequently, whether the sample size calculated initially is still adequate based on these observed nuisance parameter values. From such an exercise, there are three possible outcomes: (1) the current sample size is adequate and therefore there is no need for change; (2) the sample size should be increased, unless cost is prohibitively raised; or (3) a lower sample size would be adequate. This last scenario occurs, for example, when the assumed variance at the design stage is higher than what is observed from the data. In this case, the appropriate decision is debatable. Some recommend to decrease the sample size, not to subject future participants to unnecessary risk, to save resources, and to publish results earlier. Others recommend to keep the sample size as initially planned, so that more data give more accurate results for both primary and secondary outcomes, as well as for safety and subgroup analyses; and since resources were approved for the initial sample size, there is no need to reduce it. They also argue that the nuisance parameters calculated midstream are themselves uncertain and may change when all the data are collected and analyzed, and may end up closer to what was initially planned. Recalculating the sample size based only on nuisance parameters does not involve any “statistical penalty,” since by definition nuisance parameters contain no information about the outcome of interest. Thus there is relatively little downside (other than additional analyst time) to including sample size recalculation in the initial trial design, unless the trial length or rate of recruitment does not allow for any sample size modification. Sample Size Recalculation Based on Nuisance Parameters and Observed Treatment Effect Here, we consider the case of whether the sample size should be changed based on the values of the nuisance parameters and the treatment effect observed so far. The reasoning motivating this kind of analysis is that if the observed treatment effect is smaller than initially anticipated, the current sample size will not provide enough power to detect the observed, smaller treatment effect; and consequently, the sample size should be increased. This is a controversial issue. Some believe that power analysis should be based on the minimum clinically meaningful treatment effect to be detected and not on the observed treatment effect regardless of how small it is. Criticism of performing this type of interim analysis relates to concerns about potential bias, loss of efficiency, and the possibility of increasing the sample size to detect clinically meaningless differences.21,23

135

Unlike the previous sample size recalculation exercise, recalculating the sample size based on nuisance parameters and observed treatment effect does involve a “statistical penalty.” This is because multiple looks at the treatment effect, just like multiple comparisons, do increase the chance of claiming statistical significance when in fact there is no difference. The nature of the statistical penalty is to make each of the repeated looks at the data at a significance level that is stricter than the usual 0.05 (i.e., 1) and protective associations (relative risk < 1). When a relative risk or OR is equal to 1, it means that risk of exposure (or odds of exposure) is the same in those with or without disease; that is, there is no association between disease and exposure. Both the relative risk and OR are measures of the strength of the association; they do not answer, for instance, how much disease may be prevented if we could eliminate an exposure.

243

The difference in the risks or incidence rates is simply that disease risk in those exposed or with a certain characteristic minus the disease risk in those not exposed. Sometimes we are interested in absolute risk, risk differences, or attributable risks. Attributable risk is the amount of disease that can be attributed to a certain characteristic or exposure. While many times attributable risk is presented as a proportion of the absolute risk in the comparison group, the goal is still to look at excess risk. If an observational study reports that people with a new strain of influenza (Group A) have 25% chance of death and people with a mutation (Group B) have 50% chance of death, we generally jump to the conclusion that Group B is twice as bad off. We intuitively jump to the relative risk. Few people will immediately say there is an OR of 3; while the OR is 3, few people take that from the simple data without a computer calculating the number. We will report the OR in the manuscript, though, and unfortunately, someone will see the OR and incorrectly interpret it that people are three times more likely to die in Group B. They might say the odds are three times higher but that too will be misinterpreted by the person reading the news and worried about dying. In fact the risk difference has an even different interpretation. Perhaps, the probability of being in Group A is 0.005% and the probability of being in Group B is 0.00001%. Should we be worried about dying from this and take every preventive step possible? Absolute risk would tell us no. If we or someone close to us dies, we likely care and wish the preventive steps were taken. If we are less close to the situation, we likely complain about all the measures being taken or worry needlessly that we are going to die. The concepts of relative and attributable risk are essential for policy and when prevention is being decided. They are not only used in observational studies but also are frequently used to interpret study data. Several common misinterpretations of the data are discussed in the next section.

MISTAKES, MISCONCEPTIONS, AND MISINTERPRETATIONS None of us are perfect, and we frequently make mistakes. Below we discuss a few of the common mistakes, misconceptions, and misinterpretations involving observational studies.

Always Trusting Bivariate Associations Based on Observational Study Data At times, our analyses ignore the richness of observations and simply compare two variables to each other. These are bivariate associations. An example of this is

II. STUDY DESIGN AND BIOSTATISTICS

244

17. DESIGN OF OBSERVATIONAL STUDIES

when we use chi-square tests and correlation as the only statistical analyses. While these tests are useful to explore data, they have severe limitations in observational studies. The correlation of two continuous variables evaluates if the two variables have a straight line association. Two variables may be strongly associated in a U or X shape or another shape and not have a strong correlation but have a very strong association. Additionally, if other variables are added to the modeling the association seen with a chi-square or correlation test may disappear, or what looked to be nothing becomes a strong association. We discuss methods to investigate associations between variables in Chapter 24. Since correlation and bivariate chi-square tests may be misleading because many univariate and bivariate associations disappear when a multivariate method such as regression is used, correlation and chi-square tests are not recommended for definitive analyses of observational data. They are useful for exploratory analyses and provide a simple way to present data.

Assuming Odds Ratios and Relative Risks Will Have a Similar Magnitude Relative risk can be computed in two possible ways. We have to define “risk of what?” A simple way to explain this is each time we can consider if a characteristic increases the probability of success; we also could have considered if the characteristic decreases the probability of failure. A small relative change in the probability of one event’s occurrence is usually associated with a large relative change in the event not occurring. Schulman and colleagues ran a controlled experiment and published an article reporting potential bias by physicians when recommending cardiac catheterization for patients with chest pain.29 A simplification of the data is presented in Table 17.3. Using the data in Table 17.3, we can compute the OR to be 0.57 or 1.74. The paper (using multivariate logistic regression) concluded that physicians make different recommendations for male patients than for female patients. Schwartz and colleagues30 wrote a critique and said the OR overstated the effect and that the RR was appropriate. Using the data in Table 17.3 the RR is only 0.93 (the reciprocal would be 1.07). The associated responses are quite illuminating and worth the time to read. Study Example Data

TABLE 17.3

No Catheterization

Catheterization

Total

Male

34 (9.4%)

326 (90.6%)

360

Female

55 (15.3%)

305 (84.7%)

360

Total

89

631

720

While ORs can be more easily adjusted for covariates, the relative risk may be clinically the more important association to consider here. Is it appropriate to look at 90.6% versus 84.7%? Yes. It is also appropriate to compare the rates for recommending a less aggressive intervention (9.4% vs. 15.3%) where the relative risk is 1.63 (reciprocal 0.61), quite a large value. Relative risks seem more intuitive but that does not mean that they are easier to interpret or even possible to calculate. Care must always be taken when interpreting results. In this case everyone was correct, more information could be provided, and interpretation was incomplete.

Misinterpreting Relative Measures The Schwartz piece lists several useful communication guidelines.30 It is hard to correctly interpret an OR. Many times, to be efficient with words, we misinterpret ORs as relative risks or risk ratios. Sometimes ORs can be converted to risk ratios, but this is tricky and should be done carefully by a statistician or epidemiologist. Another common problem is ensuring the appropriate comparison is made. Which group is in the denominator? Sometimes findings about one specific group are accidentally attributed to a broader group. Finally, sometimes we report relative rates when we could simply report absolute rates. Reporting a group is 50% less likely to receive treatment sounds useful, but reporting the actual percent treated in each group is more useful and improves the ability of the reader to draw conclusions.

Implying Causation (Even When We Do Not Mean to Do It) “The impact is huge, just look at the large OR/relative risk/absolute risk!” Focusing on the association’s magnitude or size and assuming causation is a common misstatement. In a study with multiple variables, there often will be several variables that appear to be related to the outcome or exposure of interest. We may think that a temporal sequence seems biologically reasonable. While several studies may demonstrate significant evidence that the effect of an exposure on an outcome is nonzero, observational studies, especially a single observational study, cannot allow us to assume causation. At most we can assess associations. Not everyone agrees with this and Chapter 27 will go in to more details about causal inference from observational data. Table 17.4 lists the evidence that researchers believe is necessary to “prove” that observational study evidence supports causation. The table is brief, but in short all of the evidence needs to point the same way, the observational evidence needs to be strong, clinically or

II. STUDY DESIGN AND BIOSTATISTICS

MISTAKES, MISCONCEPTIONS, AND MISINTERPRETATIONS

TABLE 17.4 1. 2. 3. 4. 5. 6. 7. 8.

Evidence in Observational Studies That Supports Causation

Statistical significance Strength of the association (odds ratio, relative risk) Doseeresponse relationships Temporal sequence of exposure and outcome Consistency of the association (internal “validity”) Replication of results (external “validity”) Biologic plausibility Experimental evidence

biologically meaningful and plausible, statistically significant, shows some type of dose response, and ultimately some experimental evidence is needed. This experimental evidence might be in animals or laboratory experiments to supplement observational data collected in humans. In short, it is not one study’s worth of evidence; an entire body of evidence from many studies conducted by many different groups is needed.

Confusing Causation, Prediction, Association, and Confounding Consider the following example. A mother’s genome has a causal relationship to her daughter’s height because the mother gives part of her genes that influence height to her daughter. However, if the mother also has a son, that son’s genome is merely associated with the daughter’s (his sister’s) height because each sibling receives part of their mother’s genes, but the son does not give any of his genes to the daughter (except in Greek tragedies). Finally, a mother’s genome and a nutrition program are confounded because their effects on a daughter’s height are mixed together. (An aside: randomized studies are not immune to confounding.) We may use the mother’s height as part of an algorithm to predict the height of her children; prediction is discussed in Chapter 27. Consider another hypothetical example. Suppose a team of researchers designs a cohort study to address the question of whether smoking causes premature death. They may construct two groups of middle-aged men (50e55 years old) who are smokers and nonsmokers, with 2500 subjects in each group. The subjects may be examined at baseline, followed prospectively and longitudinally, and their ages at death were recorded. Suppose the median time to death is 8 years earlier for smokers than for nonsmokers, and that this difference is statistically significant. Are the researchers justified in concluding from this study that smoking causes premature death? No. The tobacco companies can respond that smokers are inherently different from nonsmokers. Perhaps there are some genetic, socioeconomic, or behavioral factors that cause (or predispose)

245

people to smoke and that also cause them to die at an earlier age. Are the researchers, nevertheless, justified in concluding from this study that smoking is associated with premature death? Yes, that is the precise function of observational studiesdto propose associations. Was 50e55 the right age group for the study? Perhaps for seeing the events of interest and the study question, but it does exclude men who could have been in the study if they had not died prior to age 50. We may say it will be hard to generalize the information to other age groups, or we feel comfortable generalizing the results if the percentage of men dying prior to age 50 is small or the reasons well characterized.

Assuming Observational and Randomized Studies Never Agree A large set of observational studies led to a set of ambitious clinical trials and observational studies in women’s health. The Women’s Health Initiative (WHI) was launched in 1991 and consisted of a set of trials in postmenopausal women motivated by several prevention hypotheses.31 Women were enrolled into a randomized controlled trial (RCT) or an observational study. The hormone replacement therapy (HRT) hypothesis assumed women assigned to estrogen replacement therapy would have lower rates of CHD and osteoporosisrelated fractures. Progestin and estrogen were to be used in women with a uterus, and breast and endometrial cancers would be monitored. The hypothesized cardioprotective effects of HRT in postmenopausal women could not be proven in observational studies but had become widely accepted over time due to the adverse affects of menopause on the lipid profile. Prior to WHI, epidemiologic evidence, the majority of 30 observational studies, reported a benefit in ageadjusted all-cause mortality among estrogen users. Questions remained about the demographic profile associated with these observational studies’ participants: rather healthy and younger with little pertinent data on women beginning hormones after age 60 years; the use of the combination treatment estrogen plus progestin instead of unopposed estrogen; and the overall risk and benefit trade-off. Observational studies had noted a modest elevation in the risk of breast cancer with long-term estrogen use; however, adverse event data on progestin were inconsistent at the time. At the inception of the WHI, it was to be the study with the longest follow-up. The questions addressed in the clinical trials were posed based on epidemiological evidence. The diet portion of WHI also was based on epidemiological evidence.32 When approaching a RCT from a base of cohort studies, several important points must be addressed. If

II. STUDY DESIGN AND BIOSTATISTICS

246

17. DESIGN OF OBSERVATIONAL STUDIES

the motivation for a cohort study is to evaluate risks associated with a treatment or an exposure, then the study needs not only long-term users but also sufficient number of newly exposed participants to assess shortand long-term intervention effects. Time variation also must be taken in to account, and exposure effects may need to be evaluated over defined exposure periods.33 Confounding due to unmeasured factors has an important role in observational research, one that results in misleading observational studies. The estrogen plus progestin portion of WHI stopped early after finding estrogen plus progestin did not confer cardiac protection and could increase the risk of CHD, especially during the first year after the initiation of hormone use.34 The take-home messages were not simple yes/no answers and women were advised to talk with their doctors about their personal health and family history. Whereas some believed the WHI hormone therapy results were surprising, others did not.35e37 What the experience has taught us is to pay close attention to observational study design and analysis. We have to remember that a potential for publication bias, changes in populations under study, and incorrect analyses of prior study data (even if due to unavoidable circumstances) may lead to varying results. Additionally, populations shift over time. Hypothesis development, particularly in prevention which many times is based on cohort data, is vital and far more difficult than imagined at first glance. As researchers, we must always think ahead while planning our current study to the next several trials and studies that may result from our anticipated (and unanticipated) findings.

Trying to Design a Randomized Study When We Need an Observational Study Consideration of observational studies as alternatives to RCTs provides insight into the advantages and disadvantages of the latter from a scientific perspective. Suppose we want to study the effects of oral contraceptives on the risk of breast cancer over 30 years in women who began to use the pill in their early 20s. From the scientific perspective the ideal way to address this question would be through a clinical trial. The researchers would randomly assign women in the trial to either treatment or placebo groups and then follow them prospectively for 30 years and observe which group experiences more cases of breast cancer. Such a study would present many ethical challenges and also would prove impractical since it would be impossible to blind the subjects and researchers as to the treatment assignment, at least after the first pregnancy. From the ethical and practical perspectives, the best way to address some questions would be through an

observational study. An example of a cohort study: women would choose whether or not to use certain types of oral contraceptives in their early 20s, and the researchers would merely follow them prospectively and longitudinally over 30 years to observe who develops breast cancer. We would need to consider whether the group of women who chose to use the medication differed systematically from the women who chose not to do so. In a case-control study, researchers would construct groups of women in their 50s who had or had not developed breast cancer and then retrospectively look into the past to determine which women had used oral contraceptives and determine which other life events may have influenced the risk of breast cancer. We would need to consider how well the women selected for this study reflect the original population of women who began to use the pill in their early 20s. We also would need to consider if other risk factors were well understood and if the data had been reliably collected on all risk factors on all women in the study. We would have to realize that by sampling women in their 50s, we would likely miss early, aggressive cancers where women died before they reach age 50, and that our results would not be generalizable to all types of breast cancer. In a cross-sectional study, researchers would collect a sample of women in their 50s and then simultaneously classify them on contraceptive use and breast cancer occurrence. We again would need to consider how well we could use this sample to make inference back to the population of women who began to use the medications in their early 20s. Regardless of how well an observational study is constructed, questions may arise about applicability and unknown risk factors. Currently, different oral contraceptive pills, devices, and doses are used compared to 30 years ago, so can the previous study shed light on today’s women in their early 20s and their future health? Are there unknown or unmeasurable risk factors that might be playing a role in the study results? In observational studies, we can only control for known and measured variables.

Assuming an Observational Study Is “Safe” and Does Not Need External Monitoring Too frequently, investigators conducting observational research (and those overseeing it) assume it is minimal risk simply because of the lack of intervention, and they further assume that external oversight by a committee such as Data and Safety Monitoring Board would not be useful. However, many reasons exist to consider having an Observational Study Monitoring Board (OSMB) or similar external committee to look at the same elements described in Chapter 10 on data

II. STUDY DESIGN AND BIOSTATISTICS

247

REFERENCES

and safety monitoring. Regular external review of overall study conduct and data procedures, including protection of the confidentiality of participant data, can help studies have more reliable results. Such committees also provide recommendations related to overall scientific direction, proposed ancillary studies, participant burden, center performance and study progress, analyses and quality control, and issues related to referral for abnormal findings, informed consent, and safety. Trials with high risk, including risk to the public health if poorly performed, with multiple sites or registries, measures with potential safety concerns, and others should consider convening such a board. The US National Heart, Lung and Blood Institute has information online further describing responsibilities of an OSMB (https://www.nhlbi.nih.gov/research/funding/ human-subjects/data-safety-monitoring-faq#14).

CONCLUSIONS Observational studies are valuable alternatives, predecessors, and follow-ups to clinical studies. They may be used to chart the natural history or extent of a disease in a population. They may be useful in providing preliminary evidence of an effect, which ethics and practicality permitting can subsequently be studied with a well-designed RCT. They may be used to follow up changes in a population over time after the findings from a RCT are released. All studies have weaknesses; observational studies have the scientific weakness that they can be used only to find associations between risk factors and responses, but alone they cannot establish causation. That does not diminish their importance. Observational studies may seem easy to some clinical researchers. That is a mistake. It is hard to do good observational research. Because they are not controlled experiments, in observational studies many factors may be varying across subjects simultaneously, and hence we are required to measure many things in many different and potentially changing ways, very accurately, at times rather often to not miss fleeting changes in the data. We cannot lose any data, and we need to measure everything the same way every time. We will never know what we do not measure, and we cannot measure everything. Researchers doing experimental studies are supposed to do the same thing but try to fall back on introducing a single change and randomization to help determine causation and say that what was unmeasured should be balanced between groups. Good observational studies are extremely useful clinically and in research. A single case report may lead to the discovery of a new disease, and another might lead to a cure or a way to focus on a population most at risk for the disease. Without large surveys we cannot assess

large populations and compare where they are now to where they were in the past. Many associations cannot be otherwise studied, as a randomized study would not be ethical or feasible. These study methods are appropriately used not only in human studies but also are used by veterinarians, agriculturalists, police, and others. Good observational studies are vital to inform medical, public health, policy, and regulatory decisions.

QUESTIONS 1. Does epidemiology assume that human disease occurs at random? a. Yes b. No 2. Which of the following is most likely a case-control study? a. Report of five cases of pneumocystis pneumonia in previously healthy homosexual men b. National survey of health and nutrition c. Association study of maternal use of stilbestrol with tumor appearance d. Observational study of cardiovascular health in men and women over 65 3. For rare disease does the odds ratio (OR) estimate the relative risk (RR)? a. No b. Yes c. Depends

Acknowledgments The author would like to thank Jack M. Guralnik and Teri A. Manolio for their work on previous editions of this book and course materials over the years. She also would like to thank her many students over the years for their great questions, challenging research projects, and many examples.

Disclosures This chapter reflects the views of the author and should not be construed to represent FDA’s views or policies.

References 1. Altman DG. Practical statistics for medical research. Boca Raton (FL): Chapman & Hall; 1991. 2. Agresti A. Categorical data analysis. 2nd ed. Hoboken (NJ): Wiley; 2002. 3. Gordis L. Epidemiology. Philadelphia (PA): Harcourt Brace & Company; 1996. 4. Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, Newman TB. Designing clinical research. 2nd ed. Philadelphia (PA): Lippincott Williams & Wilkins; 2001. 5. Elwood M. Critical appraisals of epidemiological studies and clinical trials. 2nd ed. Great Britain: Oxford University Press; 1998.

II. STUDY DESIGN AND BIOSTATISTICS

248

17. DESIGN OF OBSERVATIONAL STUDIES

6. Fleiss JL. The design and analysis of clinical experiments. New York: Wiley; 1999. 7. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, Poole C, Schlesselman JJ, Egger M, STROBE Initiative. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. PLoS Med 2007;4(10). 8. Friedman GD. Cigarette smoking and geographic variation in coronary heart disease mortality in the United States. J Chronic Dis 1967;20:769e79. 9. Joo JB, Cummings AJ. Acute thoracoabdominal aortic dissection presenting as painless, transient paralysis of the lower extremities: a case report. Emerg Med 2000;19:333e7. 10. CDC. Pneumocystis pneumonia, Los Angeles. MMWR 1981;30: 250e2. 11. Hedley AA, Ogden CL, Johnson CL, Carroll MD, Curtin LR, Flegal KM. Prevalence of overweight and obesity among US children, adolescents, and adults, 1999e2002. JAMA 2004;291: 2847e50. 12. Klungel OH, Kaplan RC, Heckbert SR, Smith NL, Lemaitre RN, Longstreth Jr WT, Leufkens HGM, de Boer A, Psaty BM. Control of blood pressure and risk of stroke among pharmacologically treated hypertensive patients. Stroke 2000;31:420e4. 13. Flegal KM, Graubard BI, Williamson DF, Gail MH. Cause-specific excess deaths associated with underweight, overweight, and obesity. JAMA 2007;298:2028e37. 14. Lilienfeld AM, Lilienfeld DE. Foundations of epidemiology. 3rd ed. New York: Oxford University Press, Inc.; 1980. 15. Strong JP, Malcom GT, McMahon CA, Tracy RE, Newman WP, Herderick EE, Cornhill JF, Pathobiological Determinants of Atherosclerosis in Youth Research Group. Prevalence and extent of atherosclerosis in adolescents and young adults: implications for prevention from the pathobiological determinants of atherosclerosis in youth study. JAMA 1999;281:727e35. 16. Newman AB, Naydeck B, Sutton-Tyrrell K, Edmundowicz D, Gottdiener J, Kuller LH. Coronary artery calcification in older adults with minimal clinical or subclinical cardiovascular disease. J Am Geriatr Soc 2000;48:256e63. 17. Schlesselman JJ. Case-control studies: design, conduct, and analysis. New York: Oxford University Press, Inc.; 1982. p. 17e9. 18. Sackett DL. Bias in analytic research. J Chronic Dis 1979;2:51e63. 19. Herbst AL, Ulfelder H, Poskaner DC. Adenocarcinoma of the vagina: association of maternal stilbesterol therapy with tumor appearance in young women. N Engl J Med 1974;284:878e81. 20. Plassman BL, Havlik RJ, Steffens DC, Helms MJ, Newman TN, Drosdick D, Phillips C, Gau BA, WelsheBohmer KA, Burke JR, Guralnik JM, Breitner JCS, et al. Documented head injury in early adulthood and risk of Alzheimer’s disease and other dementias. Neurology 2000;55:1158e66. 21. Doll R, Hill AB. The mortality of doctors in relation to their smoking habits: a preliminary report. Br Med J 1954;228(i):1451e5. 22. Doll R, Peto R, Boreham J, Sutherland I. Mortality in relation to smoking: 50 years observations on male British doctors. Br Med J 2004;328:1519e33. 23. Brackbill RM, Hadler JL, DiGrande L, Ekenga CC, Farfel MR, Friedman S, Perlman SE, Stellman SD, Walker DJ, Wu D, Yu S,

24.

25.

26.

27. 28. 29.

30.

31.

32.

33. 34.

35. 36.

37.

Thorpe LE. Asthma and posttraumatic stress symptoms 5 to 6 years following exposure to the World Trade Center terrorist attack. JAMA 2009;302:502e16. Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics 1988;44: 1049e60. Ridker PM, Hennekens CH, Miletich JP. G20210A mutation in prothrombin gene and risk of myocardial infarction, stroke, and venous thrombosis in a large cohort of US men. Circulation 1999; 99:999e1004. Roest M, van der Schouw YT, de Valk B, Marx JJM, Tempelman MJ, de Groot PG, Sixma JJ, Banga JD. Heterozygosity for a hereditary hemochromatosis gene is associated with cardiovascular death in women. Circulation 1999;100:268e73. Barlow WE, Ichikawa L, Rosner D, Izumi A. Analysis of casecohort designs. J Clin Epidemiol 1999;52:1165e72. Laurion JP, Troponin I. An update on clinical utility and method standardization. Ann Clin Lab Sci 2000;30:412e21. Schulman KA, Berlin JA, Harless W, Kerner JF, Sistrunk S, Gersh BJ, Dube´ R, Taleghani CK, Burke JE, Williams S, Eisenberg JM, Ayers W, Escarce JJ. The effect of race and sex on physicians’ recommendations for cardiac catheterization. N Engl J Med 1999;340:618e26. Schwartz LM, Woloshin S, Welch HG. Misunderstandings about the effects of race and sex on physicians’ referrals for cardiac catheterization. N Engl J Med 1999;341:279e83. Women’s Health Initiative Study Group. Design of the Women’s Health Initiative clinical trial and observational study. Control Clin Trials 1998;19:61e109. Prentice RL, Sheppard L. Dietary fat and cancer: consistency of the epidemiologic data, and disease prevention that may follow from a practical reduction in fat consumption. Cancer Causes Control 1990; l:81e97. Prentice RL, Pettinger M, Anderson GL. Statistical issues arising in the Women’s Health Initiative. Stat Med 2005;61:899e910. Women’s Health Initiative Investigators. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med 2003;349: 523e34. Prentice RL. Observational studies, clinical trials, and the Women’s Health Initiative. Lifetime Data Anal 2007;13:449e62. Prentice RL, Langer RD, Stefanick ML, Howard BV, Pettinger M, Anderson GL, Barad D, Curb JD, Kotchen J, Kuller L, Limacher M, Wactawski-Wende J, Women’s Health Initiative Investigators. Combined analysis of Women’s Health Initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. Am J Epidemiol 2006;163: 589e99. Prentice RL, Langer RD, Stefanick ML, Howard BV, Pettinger M, Anderson GL, Barad D, Curb JD, Kotchen J, Kuller L, Limacher M, Wactawski-Wende J, Women’s Health Initiative Investigators. Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women’s Health Initiative clinical trial. Am J Epidemiol 2005;163:404e14.

II. STUDY DESIGN AND BIOSTATISTICS

C H A P T E R

18 Design of Clinical Trials and Studies 1

Catherine M. Stoney1, Laura Lee Johnson2

National Institutes of Health, Bethesda, MD, United States; 2U.S. Food and Drug Administration, Silver Spring, MD, United States

O U T L I N E Design of Clinical Trials

Usual and Standard Care Controls Multiple Control Groups

250

The Purpose of Clinical Trials and Clinical Studies 250 Understanding the Spectrum of the Research Continuum Phase I Studies Phase II Studies Phase III Studies Phase IV Studies Dissemination and Implementation Studies Comparative Effectiveness Research Explanatory Versus Pragmatic Trials Quasiexperimental Studies

251 252 252 253 253 253 254 254 255

Clinical Trial Designs Crossover Designs Enriched Enrollment Designs Factorial Designs Parallel Groups Designs Sequential Trial Designs and Interim Analyses Group-Randomized Trial Designs Adaptive Treatment Designs

255 255 256 256 256 257 257 257

Critical Issues in Clinical Study Design Blinding or Masking Intervention Development Choosing the Comparison Group

258 258 258 258

Control Groups Wait-List Control Time and Attention Control Placebo Control Sham Control

258 259 259 259 260

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00018-6

260 260

Placebo Responses Background Identifying Placebo Responders

261 261 261

Mistakes and Misconceptions Not Looking at the CONSORT Statement Before, During, and After a Study Waiting Until the Large Definitive Study to Worry About the Details Failing to Increase the Treatment Effect Failing to Decrease the Variance Not Taking Care When Choosing a Control Group Always Assuming Placebo Groups Are Unethical Assuming Placebo Treatment Is (Im)Possible in Long-Term Studies Confusing Placebo Response and Regression to the Mean Using a Factorial or Partial Factorial Design Instead of a Parallel Group Design Assuming Small, Open-Label, Nonrandomized, Uncontrolled Studies Offer No Evidence

262

Conclusions

266

Summary Questions

266

Acknowledgments

266

Disclosures

266

References

266

Further Reading

268

249

262 262 263 263 263 264 264 264 265 265

Copyright © 2018. Published by Elsevier Inc.

250

18. CLINICAL TRIAL DESIGNS

DESIGN OF CLINICAL TRIALS Randomized controlled trials (RCTs), when correctly designed and rigorously conducted, provide the most definitive answers regarding intervention effects, but other clinical trial designs and observational investigations can be appropriately employed depending on resources and the specific questions of interest.1 While nonrandomized experimental studies may inform the potential promise of clinical trials, it is the RCT that provides the strongest evidence both for the causal nature of a modifiable factor and for the impact on disease outcomes in modifying that factor. The essential purpose of the RCT is to determine whether a particular intervention or treatment can reasonably be inferred to cause a change in health, disease progression, or risk factor(s) associated with a disease. The nature of the interventions could be pharmacologic, surgical, behavioral, device, strategy-based or could consist of multiple components. Randomization is used as part of many studies, but when people refer to RCTs they usually are focusing on a portion of randomized studies that are large and expected to have a hopefully definitive result. When such studies are undertaken depends in large measure on the existing literature and needs in the area. While there may be frequently promising observations that suggest a clinical trial is warranted, other essential factors must be established before undertaking an RCT. These might include such things as development of an intervention or treatment, which has been shown to alter some relevant outcomes, determination of the appropriate dose of the intervention, establishment of feasibility in a particular patient population, showing evidence for changes in surrogate markers of a disease of interest, and gathering other essential information necessary for optimizing design. A good understanding of the existing literature in the area of interest, the strength of the designs and findings of the literature, and clarity regarding where the bulk of science for a particular intervention or treatment sits on the research continuum are essential for formulating an appropriate question and design for a clinical trial. A note to readers: all of the study designs and related issues discussed can be used in many types of experiments. This chapter focuses on studies in humans related to disease, but the general principles apply in other settings as well. Careful planning is important to avoid errors and misleading results.

THE PURPOSE OF CLINICAL TRIALS AND CLINICAL STUDIES Clinical trials provide a unique source of evidence. As discussed in Chapter 17, observational or epidemiologic

studies are critical for hypothesis generation, discovering associations, identifying areas of health research that are “low-hanging fruit,” and providing evidence when a clinical trial is not feasible or ethical. But clinical trials are unique because they are experiments that allow researchers to infer causality. While consistency in the findings of a large number of observational studies can lead to the belief that the associations are causal, this belief is a fallacy. One of the best examples of this phenomenon was demonstrated with the results of the Women’s Health Initiative (WHI), which was launched after a large number of observational studies suggested that hormone replacement therapy (HRT) in postmenopausal women was cardioprotective. The findings across many observational and epidemiological studies at the time were considered consistent in large part. Most studies reported cardiovascular benefits in age-adjusted all-cause mortality for estrogen users. Consequently, HRT was in widespread clinical use for symptom management among peri- and postmenopausal women and, in some cases, for cardiovascular protection. The WHI was launched in 1991 and consisted of a set of clinical trials in postmenopausal women to definitively test the effect of HRT on cardiovascular risk and osteoporosis-related fractures, in addition to a new set of observational studies. Previous observational studies had noted a modest elevation in the risk of breast cancer with long-term estrogen use; however, adverse event data on progestin were inconsistent at the time. The estrogen plus progestin portion of WHI stopped early after finding estrogen plus progestin did not confer cardiac protection and in fact could increase the risk of coronary heart disease (CHD), especially during the first year after the initiation of hormone use.2 These findings were in direct contrast to the interpretations of many, but not all, epidemiological and observational study findings from the previous 30 years, and underscored the importance of the clinical trial in establishing causality. Also important is that the WHI was conducted in diverse populations of interest to investigators, but these populations were different than many of the previous epidemiological studies. Trials are not only typically conceptualized by design characteristics but also can be categorized according to the objectives of the investigation. Treatment trials are quite common, and they are designed to test specific interventions. The main objective is to understand the efficacy and effectiveness of an intervention on a specific outcome or set of outcomes in a specific patient population. Prevention trials aim to test ways to not treat, but rather prevent, diseases. Primary prevention trials are focused on preventing initial occurrence of a disease; secondary prevention trials many times are focused on preventing recurrence or exacerbation of a disease in asymptomatic persons with risk factors or positive

II. STUDY DESIGN AND BIOSTATISTICS

UNDERSTANDING THE SPECTRUM OF THE RESEARCH CONTINUUM

screening; tertiary prevention trials are focused on those who have symptomatic disease in an attempt to prevent further deterioration. Screening trials test strategies for identifying the detection of disease or risk for a disease. Diagnostic trials examine strategies for the correct diagnosis of a disease or disorder. Health-related quality of life studies are focused on strategies to promote quality of life, either within the context of a disease or treatment or by itself. Comparative effectiveness trials are concerned with identifying which of two or more treatments are superior in some way. The strength of the well-designed RCT is its ability to establish causality, thus overcoming the major weakness of all other types of study designs. However, it is important to note that the RCT design is sometimes not feasible for some clinical questions. Ethical considerations, costs, resources, or time may all prove prohibitive in certain cases, making the conduct of an RCT not feasible or optimal. For example, today a longitudinal RCT studying the health effects of cigarette smoking on adolescents into adulthood would not be feasible because it would be unethical to randomize adolescents to a smoking condition. A question about what is ethical may change over time, however. While previously considered unethical in some cases and countries, randomized trials of pregnant women and infants have become increasingly important to provide evidence-driven recommendations. We know that new does not always equate to better, and sometimes the more efficacious and safer study arm to be on is the placebo or active control arm. The International Conference on Harmonisation E10 guideline discusses choice of control group and related issues, and it also clarifies that without well done randomization there is a recognized inability to control bias, and the resulting consequences of potential persuasiveness of the findings are deeply problematic.3 It is unethical to conduct a study that will not, due to its design alone, result in interpretable findings. Whenever possible, the RCT is the strongest study design for establishing causality.

UNDERSTANDING THE SPECTRUM OF THE RESEARCH CONTINUUM The most optimal design, analytic strategy, and end points for any research study are dependent, more than anything else, on both what question is being addressed and where that specific question fits on the spectrum of the research continuum. Failure to consider these factors can lead to violations of both internal and external validity. External validity refers to the extent to which a research finding can be generalized to other situations, individuals, measurement instruments, and across time. Internal validity refers to the strength to which the independent variable, many times a treatment

251

or exposure but, in general, whatever is being manipulated or changed, can be said to be responsible for the outcome or change. Factors that decrease the generalizability or that compete with the independent variable to explain the findings are termed threats to validity. Very early phase investigations may be hypothesis generating and, while sometimes randomized, are most typically observational and epidemiological designs. Exploratory studies are most appropriate in treatment development and do not require rigid control, but are far from definitive and allow few conclusions to be made. Such studies are useful in establishing the possibility of a signal that the intervention may have an effect on the outcome of interest. Phase I clinical studies often follow exploratory studies and are primarily focused on questions of mechanism and safety. Because of the importance of high internal validity in questions of safety, Phase I studies require tight experimental control. Midphase investigations, such as Phase II (efficacy) RCTs, are hypothesis confirming and require control of known and unknown sources of error to minimize violating internal validity. Later phase studies, such as Phase III (effectiveness) and Phase IV RCTs and implementation trials, are more translational in nature and are primarily concerned with whether a particular intervention can be effective when implemented in increasingly real-world settings. These latter trials, while still requiring control of some sources of error, are more concerned with whether a particular intervention can be effective across situations outside of the tightly controlled setting of an earlier phase RCT. For example, late-phase trials may address questions of whether findings of a Phase II trial can generalize across various patients and health-care provider situations as they exist in the real world. Thus, the type of control most important for this phase of study is control for external, rather than internal, validity. Understanding not only the specific question that one wishes to address but also the state of the science in that particular area is critical to developing an appropriate study design for that particular question at a particular point in time. Inherent in this discussion is that the first stage in developing an appropriate design and analytic strategy for any given study is being able to clearly articulate the precise research question to be addressed. The more carefully and fully the research question can be expressed, the clearer the choices of design, end points, control groups, and analytic strategies become, and various elements of how to develop and conduct the study under question are clarified. It is important to be highly conversant in the relevant literature in the area to understand not only the questions that have been addressed and answered but also how they are addressed, what is known, where the gaps are, and where the knowledge base sits.

II. STUDY DESIGN AND BIOSTATISTICS

252

18. CLINICAL TRIAL DESIGNS

Phase I Studies Phase I studies, which include dose-ranging and safety studies, traditionally (but not always) are nonrandomized. The fundamental goal of these studies is to find appropriate dose levels and to detect potential toxicities due to the investigational intervention or treatment. When feasible, a dose-limiting toxicity (DLT) threshold or physical event must be defined to create a stopping rule. Usually the definition of a DLT is based on criteria, such as a certain grade of toxicity as defined by the National Cancer Institute Common Toxicity Criteria for Adverse Events (NCI CTCAE). For interventions that do not result in toxicity regardless of dose, investigators must establish criteria for toxicity or define something other than toxicity for the DLT. It is important to note, however, that dose-ranging studies are no less important for these types of interventions. For example, the appropriate dose of a particular psychotherapy for the treatment of major depressive disorder (MDD) is important to determine, and such a dosing study needs to establish safety, stopping rules, and specific definition of toxicity for that particular intervention. In this example, toxicity might be conceptualized as diminishing return in relationship to measured side effects and patient burden. Different interventions, study designs, and patient populations will require different stopping rules, especially since some treatments are nontoxic at all dose levels.3e5 In all cases, however, carefully defining the DLT prior to the onset of the dose-ranging study is imperative. Classically for pharmacologic treatment studies, a few dose levels or categories are selected and a small number of participants are treated at each dose level, typically escalating through the dose levels in the following manner. A few participants are enrolled at the lowest dose level in the protocol, and if none develops a DLT then the study escalates to the next dose. If a DLT is observed in one of the participants, then a few additional participants are enrolled, such that now all enrolled participants will receive the current dose. If none of the additional participants develops a DLT, then the study escalates to the next dose. Participants are not entered at a new dose level until all participants in the previous levels remain free of toxicities for a specified period of time. In addition, the maximum tolerated dose (MTD) usually is defined as the dose level immediately below the level at which a certain percent, usually 33% (which may be as low as two study participants) experienced a DLT. Usually, the study aims to find a safe dose defined as the MTD or the study finishes at the maximum dose that is prespecified in the protocol. There are many variations on this type of study.6,7 For nonpharmacologic treatment studies, dose-ranging investigations are less common and instead doses often

are chosen on the basis of feasibility, cost, typical practice (for those interventions which are in active clinical use but not empirically established as efficacious), and patient burden. For drug and nondrug studies, doses are sometimes determined as part of a Phase II trial. Whenever dose ranging is determined, it is critical for establishing safety and especially for identifying the optimal treatment intensity. Failure to include doseranging studies, including in nonpharmacologic investigations, can lead either to the premature conclusion that a treatment is not efficacious (if the dose is too low) or that a treatment is efficacious but not accessible (due to cost, burden, etc.). Regarding safety, the use of this methodology from pharmacology treatment studies to broader contexts has resulted in the definition of MTD also to be expanded to include nontoxic but nevertheless undesirable events (i.e., expanded definition of toxicity). Randomized Phase I studies are also conducted for pharmacologic and nonpharmacologic interventions. These studies may include a small number of participants receiving a control intervention at each dose level.

Phase II Studies The early screening of new therapies in the past often was conducted with single arm or nonrandomized studies. In treatments for cardiovascular disease, for example, Phase II studies often were conducted in which patients were treated and their responses (not necessarily a clinical response but often responses on a biomarker) were observed. The purposes of these studies was not to prove that a new therapy was efficacious on the ultimate clinical end point of interest (e.g., survival) but only that it has sufficient activity (e.g., reduction in a clinically relevant biomarker) to be tested in a larger RCT. These types of early Phase II studies provide the necessary signal that some activity is occurring to justify larger and more expensive trials. These, as well as earlier phase studies, are often referred to as proof of concept studies. Randomized or not, early Phase II designs typically require a relatively small number of patients; when the evidence shows that the benefits of the new therapy are small or nonexistent, the designs prevent large numbers of patients from being exposed to useless or even potentially harmful treatments. The disadvantage of the nonrandomized and open-label strategies sometimes considered for Phase II is that there is less experimental control imposed than is optimal. Thus, internal validity is sacrificed and can result in prominent placebo effects, investigator and other sorts of bias due to lack of masking, regression to the mean, and other threats to internal validity.17 For example, an early Phase II treatment study of patients with relapsing/

II. STUDY DESIGN AND BIOSTATISTICS

UNDERSTANDING THE SPECTRUM OF THE RESEARCH CONTINUUM

remitting multiple sclerosis (RRMS) might screen participants for moderate magnetic resonance imaging (MRI) activity and then follow them longitudinally. Because the natural process is relapsing/remitting, there is the potential that patients will be screened when in a relapse and naturally move into a remitting phase over time. Consequently, one may see a reduction in disease activity over time, even if the experimental therapy is ineffective. This design would be improved with a more rigorous design, incorporating an appropriate control group and perhaps multiple measurements. However, this would require designing a larger and more expensive trial with little evidence that the experimental treatment might be effective. Thus, early Phase II studies play an important role in helping to direct the next steps in a research program, but the findings should not lead, without many additional subsequent and more rigorous studies, to changes in clinical practice, guidelines, or health-care policy. Another early Phase II design is an external or historical control study. These studies have been used in rare diseases and cancer research.8 Here, instead of creating a randomized comparison group, a single group of patients is treated, and their responses are compared with controls from previous studies or a registry. These studies have the advantage of using only half the number of patients, and all the current study subjects receive the new therapy. They also have serious disadvantages. In addition to the other problems with a nonrandomized study, these controls often do not provide a good comparison with the new treatment patients. For example, controls often are taken from studies conducted years ago for conditions with rapidly changing profiles and treatment strategies; this is especially problematic because diagnosis, treatments, technology, and patient care can change over even relatively short periods of time. Measurements that may be present in the intervention arm may not exist in the control’s available data or not on the time schedule used by the intervention arm. In addition, the patient population characteristics may change. These changes, which are often not recognized or reported, can result in serious biases for assessing treatment efficacy. All of these designs may consider employing a version of an optimal two-stage design for statistically controlled interim analyses,9e11 but randomized study designs are usually preferred. Randomized Phase II clinical trials can avoid the problems of the nonrandomized studies described previously, employ greater control, and enable stronger causal inferences to be made. These designs aim to select the superior treatment arm from two or more arms. Some designs use a binary outcome describing failure or success along with statistical selection theory to determine sample size, and others rely on continuous or ordinal outcomes. Some are based on frequentist

253

methods and others on Bayesian methods. The goal is to have a low probability of choosing an inferior arm out of the total number of arms of the study. In the randomized Phase II design, an interim analysis to stop the study early for sufficient biological or clinical activity is possible, allowing the agent to be moved forward more quickly to another, larger study. This is in contrast to the optimal stage 2 design, which cannot be stopped early for efficacy. Thus, it is important to choose the design that is best for the study question.12

Phase III Studies Phase III studies, which can be efficacy or effectiveness studies, are large prospective trials designed to compare an experimental and control (or standard) intervention. Phase III trials can be designed to demonstrate superiority, noninferiority, or equivalence. They are typically longer in duration than Phase II trials and employ less control on participant characteristics, delivery of the intervention, and characteristics of the study environment. Phase III studies are more “real world” than are earlier phase studies, and while internal validity is lower, external validity is much higher. As with Phase II trials, interventions tested in Phase III trials include drug, behavioral, devices, surgical procedures, and more. Phase III trials may be used for many types of investigations, including evaluating an intervention for the purposes of treatment, prevention, or diagnosis.

Phase IV Studies Phase IV studies are large, generally postmarketing surveillance studies of population safety, effectiveness, and generalizability. Phase IV clinical studies are designed to study longer-term effects of treatments on populations.

Dissemination and Implementation Studies Dissemination and implementation studies typically are carried out after interventions or treatments have been identified as effective or efficacious in reasonably sized studies. These studies are essential to the process of translating new information to public health use. Although the terms dissemination and implementation are sometimes used interchangeably to refer to all translational science, the terms are distinct and refer to specific steps in the translation of evidence-based practices into general use. Implementation science tests ways of integrating proven treatments and practice patterns into specific contexts and settings. Dissemination science tests ways of informing health-care providers and the public regarding evidence-based treatments. Both

II. STUDY DESIGN AND BIOSTATISTICS

254

18. CLINICAL TRIAL DESIGNS

types of research are important steps in evidence-based research. The main focus of dissemination and implementation studies is to test ways of taking treatments that are efficacious or effective and evaluate their uptake, reach, sustainability, and spread in real-world settings. Rather than solely providing evidence of treatment efficacy or effectiveness, dissemination and implementation studies identify ways of implementing treatments in real-world settings, such as schools, worksites, primary care settings, and community clinics, which have great variability in clinical practice and patient characteristics. Outcomes of this type of research include training programs, outreach programs, cost-effectiveness and cost analysis, uptake and spread by providers, and adherence by both the clinical practice and patient communities. For many, although not all types of interventions, measures of implementation barriers and facilitators are incorporated into all phases of clinical research. In particular, it is useful to carry out investigations to better understand training needs, how the proposed treatment can (or will not) be integrated in to people’s lives, practice patterns and settings, and similar information in preparation for Phase III trials.

Comparative Effectiveness Research Comparative effectiveness research is a strategy that usually compares two or more evidence-based treatments, strategies, or diagnostic procedures on specific outcomes for clinical benefit and risk. Since the main focus is on two or more active and effective treatments, often “no-treatment” control groups are excluded. The ultimate purpose is to inform public policy, coverage policy, clinicians, and patients. Because treatments with empirically demonstrated efficacy and effectiveness are ideally the main focus of comparative effectiveness research, these types of studies are typically conducted at a late phase of research in a particular area. Comparative effectiveness research does not consist of a particular design strategy per se, but rather describes the goals of identifying what particular treatment is most effective for specific patients and conditions. This type of research emphasizes the value of understanding individual differences in response, and strives to identify optimal ways of tailoring treatments and procedures for specific patient needs. Although RCTs can be effectively employed to answer many comparative effectiveness questions, many other strategies are available that employ different methods, assumptions, and models.13 For example, systematic reviews, nonrandomized studies using data from electronic medical records or registries, and metaanalytic techniques are often more cost-effective than large RCTs, can sometimes answer questions regarding how best to tailor treatments to specific patient needs, and

often can be carried out efficiently if appropriate data exist and quality methods are followed. These techniques are discussed in part in Chapters 19e22.

Explanatory Versus Pragmatic Trials Schwartz and Lellouch14 characterized two different purposes of clinical trials, explanatory and pragmatic, articulating one of the most useful distinctions for the design of clinical trials. An explanatory trial is conducted under ideal circumstances, is generally highly controlled, and often seeks to elucidate a biological mechanism or establish idealized efficacy. The study population is relatively homogeneous and can be used as a model from which one may learn principles of pharmacology, physiology, or behavior that are likely to shed light on a variety of clinical issues. Explanatory clinical studies are focused on generally smaller populations and targeted outcomes to understand mechanisms of an intervention and maintain rigorous control over experimental factors. The findings are not highly generalizable to other settings, populations, and practitioners. They are particularly useful in understanding whether an intervention is efficacious and can inform larger effectiveness trials. A pragmatic trial approach focuses on whether an intervention, when practiced in real-world settings, demonstrates the expected benefit in a variety of unselected patient populations. Pragmatic trials usually focus on how an intervention shown to be efficacious in an explanatory trial works in real-world settings. Pragmatic trials typically focus on larger, more heterogeneous populations, with designs that have less control over the specific interventions and how and when they are delivered. Pragmatic trials are most useful for understanding generalizability of effects and for helping to inform health-care policy and decision-making. While an oversimplification, the dichotomous explanatory versus pragmatic reason for conducting a clinical trial may provide a useful perspective for making design choices in complex cases. Neither approach is superior to the other, but it is important to understand the advantages of each and to know which circumstances are most appropriate for which type of trial. Understanding the basic purpose of the trial will have a significant impact not only on the question addressed but also on nearly all aspects of the study design. Often these questions are addressed at different points in the research continuum. Useful tools for helping in the design of trials and in identifying where studies fall on the exploratorye pragmatic continuum are the PragmaticeExplanatory Continuum Indicator Summary (PRECIS) and PRECIS-2 tools. Originally developed by 25 clinical trialists,15 the original PRECIS tool was modified in 201516 to improve

II. STUDY DESIGN AND BIOSTATISTICS

255

CLINICAL TRIAL DESIGNS

reliability and validity. The PRECIS-2 tool allows investigators to score potential (or existing) studies on nine domains, indicating where the design falls on the explanatoryepragmatic spectrum for each domain. Domains include eligibility criteria, recruitment, setting, organization, flexibility in intervention delivery, flexibility in prompting adherence, follow-up, primary outcome, and analysis. The purpose is to assist investigators in matching the decisions made at the design stage with the desired purpose of the trial and to evaluate published trials.

Quasiexperimental Studies Quasiexperimental designs are those design strategies that do not contain all the essential elements of a true experiment but can be made powerful enough to make strong causal inferences by controlling potential threats to internal validity.17,18 Typically, quasiexperimental designs are employed because the investigator is not able to randomize study arms. When a treatment of interest cannot be randomized or a control group is not feasible, other factors can be incorporated into the design to increase internal validity. Quasiexperimental studies often are conducted in more naturalistic settings; these designs may exploit natural phenomenon such as investigating the effects of health outcomes after an earthquake or other natural disaster. Such designs are also used commonly in studying behavioral interventions in the context of real-world settings.18 In conducting such studies, the key responsibility of the researcher is to carefully think through all sources of _potential bias and threats to validity and impose appropriate and available design and analytic strategies to control such threats as much as possible. In the examples above, incorporating an available control (nonexposed) group to the study is a way to improve and strengthen the design. In some cases, it may be possible to access information of the exposed population prior to the disaster or intervention. An example is a study of the health effects of the 1980 earthquake in southern Italy.19 The affected population had been involved in a longitudinal epidemiological investigation of major CHD risk factors prior to the earthquake. At the time of the earthquake, a portion of the population had been examined in a follow-up health screen and the other portion had not. Within 2 weeks following the earthquake, health screenings were resumed for the group that had not yet been followed up. Both groupsd the exposed (screening after the earthquake) and unexposed (screening prior to the earthquake)dwere compared on a variety of cardiovascular disease biomarkers such as heart rate, cholesterol, and triglycerides. Although randomization was not possible, some control was (serendipitously) available to enable the researchers

to make some fairly strong causal inferences about the differences between the two groups attributable to exposure to the earthquake. The investigators found short-term elevated cardiovascular risk factors in the group exposed prior to data collection; subsequent follow-up data collected 7 years after the earthquake demonstrated that these elevations did not persist.

CLINICAL TRIAL DESIGNS Choosing an appropriate study design depends on whether the design matches (i.e., will be able to answer) the question that is being posed, the specific end points of interest, and whether the question being posed is the most appropriate one, given where the state of science stands on the research continuum for that particular question. In general, simple designs with a targeted and well-characterized question, clearly defined end points, and patient characteristics that will allow a clear and definitive answer are optimal.

Crossover Designs In crossover designs, each study participant receives all treatments that are being investigated but at different times. The order in which a study participant receives the treatments is randomized. For example, patient A is randomized to receive Treatment #1 for a period of time. After completing Treatment #1, the patient then “crosses over” and receives Treatment #2. Usually between treatments is a period of time called a washout when no treatment is delivered. Outcomes are examined during and/or after each treatment. In some crossover designs, particularly ones with more than two treatments, patients may not receive all treatments under investigation (partial crossover or incomplete block) but would receive more than one. The advantage of such a design is that each patient serves as his or her own control and this significantly reduces between-subject variability, allowing the detection of smaller effect sizes with reduced sample sizes. Crossover designs for the right patient population and treatment can therefore have considerably more power than other design strategies because of the reduced variation in nonspecific (nontreatment-related) factors. Crossover designs can be particularly advantageous for studying patients with conditions that have symptoms, which are relapsing/ remitting or episodic in nature if the relapsing/remitting cycles are short, such as migraine and functional pain disorders.20 The major disadvantage of crossover designs is apparent when the treatment being investigated has a sustained effect on the outcome of interest. In the example above, if Treatment #1 has an effect that is maintained long after the treatment is over, then in a

II. STUDY DESIGN AND BIOSTATISTICS

256

18. CLINICAL TRIAL DESIGNS

crossover study the impact of Treatment #2 cannot be clearly separated from Treatment #1; a crossover design in such a situation would be problematic. Such carryover effects are problematic because they sometimes cannot be seen or measured. In addition, crossover designs may be unethical in some circumstances. For example, if one of the treatments is efficacious and the condition is a serious threat, it may be unethical to cross participants in the efficacious treatment arm to the other, potentially nonefficacious treatment arm. Another potential disadvantage to this design is that patients must be enrolled in crossover studies for longer periods of time than are typical. This induces added burden on the patient and could result in greater dropout rates. For conditions that are episodic, crossover designs can be useful if the episodic nature is somewhat predictable. However, for unstable or progressive conditions, crossover designs may be problematic because added variability due to change in disease will be introduced over the course of the study. Finally, in addition to greater patient burden because of the length of the study, crossover trials often can be unpalatable to patients because they involve participation in two or more interventions rather than just one. However, for treatments that are perceived by patients as potentially equally efficacious, crossover designs can be appropriate, particularly for conditions that have stable or predictable clinical characteristics over time. A variation on crossover designs is n-of-1 studies.

Enriched Enrollment Designs A variant of the crossover design, the enriched enrollment design, may be useful in studying treatments to which only a minority of patients respond.21 If the results are not statistically significant in a conventional clinical trial but an intervention appears effective for subpopulations of patients, it is not possible to retrospectively point at the responders and claim that the treatment accounted for their relief. A potentially useful strategy is to enter responders into a second prospective comparison trial. If the results of the second trial considered alone are statistically significant, this suggests that the patients’ initial response was not just due to chance. Although at times statistically defensible, enriched enrollment designs are open to the criticism that prior exposure to the treatment may defeat a double-blind procedure (particularly with treatments that have distinctive side effects) and sometimes result in spurious positive results. Another caveat is that positive results from an enriched population of responders can no longer be generalized to the entire patient population, but rather just to a subpopulation of similarly defined responders. Enriched enrollment studies may

be of interest in treatment intervention studies because they demonstrate some limited evidence for a treatment response22 and may therefore suggest further investigation.

Factorial Designs In a factorial design, each level of a factor (treatment or condition) occurs in combination with every level of every other factor. Experimental units are assigned randomly to treatment combinations rather than individual treatments. The Fourth International Study of Infarct Survival23 was a large, multisite RCT designed as a factorial study with three treatments: oral captopril, oral mononitrate, and intravenous magnesium sulfate. The purpose of the trial was to assess the effectiveness of one of these three treatments among patients with suspected myocardial infarction (MI) on 35-day survival. Each of the three treatments could be delivered at one of two levels (e.g., placebo, standard dosage). Therefore, for this study there are eight (2  2  2 ¼ 8) possible treatment combinations. Each patient was randomized to one of the eight combinations with a probability of 1/8 (12.5%). Classically, each intervention should have independent effects; in other words, there is no interaction between any of the interventions. However, this assumption often is not valid. In such cases, a parallel arm study may be appropriate because every treatment combination is tested on a different group of participants, enabling an estimate of interactions or synergistic effects between various treatments on the response (e.g., 35-day mortality). A major challenge of factorial designs is to (1) meet the independence assumption or (2) choose a sufficiently large sample size to be able to detect meaningful interactions with high power or a good statistical chance of seeing an interaction, if it in truth is present. The main reason factorial designs are used is to examine multiple hypotheses with a single study. For example, the ISIS-4 study was designed to simultaneously examine the role of three treatments in reducing 35-day mortality in treating acute MIs. Designing a factorial study saved resources compared to designing three separate parallel group studies for each of the experimental treatments. Note that if some particular treatment combinations are not of interest, a partial or fractional factorial design that omits the less interesting combinations may be used.

Parallel Groups Designs In parallel group designs, participants are randomized to one of several possible treatments. Interest focuses on comparing the effects of the treatments on a common response or outcome. One of these groups

II. STUDY DESIGN AND BIOSTATISTICS

CLINICAL TRIAL DESIGNS

may be a placebo group (a group assigned to a placebo pill) or a control group (a group assigned to a standard or an alternative treatment). The effect on the response could be adjusted for baseline measurements of patient characteristics. A clinical trial of felbamate monotherapy for the treatment of intractable partial epilepsy was conducted with a parallel groups design, with two groups.24 The response in this trial was the average daily seizure frequency over the 2-week follow-up period. Furthermore, the double-blind randomized parallel groups design is the “gold standard” to which all other designs should be compared. It is the ideal study design to arrive at a definitive answer to a clinical question and is often the design of choice for large-scale definitive clinical trials. The challenge of this design is that it often requires large sample sizes and thus requires large amounts of resources.

Sequential Trial Designs and Interim Analyses In sequential trials, the parallel groups are studied not for a fixed period of time but, rather, until either a clear benefit from one treatment group appears or it becomes highly unlikely that any difference will emerge. These trials tend to be shorter than fixed-length trials when one treatment is much more effective than the other treatments. In group sequential trials,25 the data are analyzed after a certain proportion of the observations have been collected, perhaps after one-fourth, one-half, and threefourths of the expected total number of participants or events, and once more at the end of the study. Data analyses of the primary outcome variables during a study are called interim analyses. Group sequential trials are easier than sequential trials to plan regarding duration and resources, and they also can be stopped early if one treatment is much more effective than the others. All trials must have a mechanism for stopping early if evidence of harm due to the treatment emerges. Trials also may be stopped for futility, where futility is defined as the unlikelihood that a positive treatment effect will emerge as the result at the end of the trial. Chapter 27 covers this topic in more depth. It is particularly important to have statistical expertise on the team for designing, conducting, and interpreting interim analyses for efficacy or futility.

Group-Randomized Trial Designs Group-randomized (also known as cluster randomized) trials are those trials where the unit of randomization is one of the several types of groups rather than an individual. Such groups might include schools, clinics, worksites, communities, or other units. Group

257

randomization to treatment can be an efficient strategy when an intervention is difficult to implement on an individual level without the risk of contamination, such as interventions that affect environments. Consider, for example, an intervention seeking to change eating patterns by altering supermarket environments. Because the treatment cannot be delivered to individuals separately, group randomization of communities utilizing those supermarkets would be appropriate for such a study. Although the members of the groups are the individual units, which are observed and measured, the number of groups randomized to each condition is typically small because they often contain large numbers of members. The small number of randomized groups introduces a greater potential for threats to internal validity to operate (because randomizing a small number of groups is less likely to control potential bias), which is one major disadvantage of grouprandomized trials. Some of these threats can be decreased by utilizing appropriate analytic strategies, adherence to stringent design strategies, anticipating and measuring potential confounding variables, and by increasing retention rates. Another disadvantage of this design strategy is the need for access to adequate numbers of groups. For a study of changes within a health-care system, for example, conducting a grouprandomized trial on 30 or more health-care systems can be a significant and expensive undertaking,26 although costs can sometimes be mitigated if data from electronic health records or other available resources can be appropriately accessed.

Adaptive Treatment Designs Adaptive treatment (also called adaptive intervention or stepped-care or dynamic treatment) trials are designs that allow changes in the dose or components of an intervention after the onset of the study, as a function of individual or environmental factors or characteristics. Decision rules are established before the onset of the study regarding the characteristics of interest (e.g., gender, outcome of interest) and how they will determine assignment to specific intervention components or dose, and individuals can be randomly assigned to condition several times. These designs, when carefully constructed, can be efficient and cost-effective and are increasingly used because they allow development of individually tailored treatment strategies. One effective tool for developing adaptive interventions is the use of the Sequential, Multiple Assignment, Randomized Trial (SMART) design27, which facilities data-based planning of decision rules in the adaptive design (see Intervention Development section below).

II. STUDY DESIGN AND BIOSTATISTICS

258

18. CLINICAL TRIAL DESIGNS

CRITICAL ISSUES IN CLINICAL STUDY DESIGN

behavioral strategies that are maximally potent while also efficient by allowing identification of component parts.

Blinding or Masking Blinding or masking investigators, study staff, and participants to treatment condition, when possible, may be almost as important as randomization itself. The basic concept is to ensure that those who are delivering or receiving an intervention or measuring the outcome of an intervention do not know which treatment group a participant is part of; by doing this there is reduced conscious and unconscious bias and use of information from other sources. Individuals to be masked include the study participants, investigators, anyone conducting assessments, and those with contact with any of these people. Sometimes an intervention cannot be masked; in such cases, the study team must make all attempts to minimize potential sources of bias. For example, those involved in outcome assessments can be masked regarding study arm and/or study hypotheses to prevent a biased assessment of the outcomes or biased behaviors such as probing more deeply about potential side effects. In the case of behavioral interventions, it may not be possible to mask participants to study arm but it is possible to mask regarding study hypotheses. While masking to study arm can be more difficult and costly, it can have significant impact on study credibility. In general, an open study, or one that does not have elements of masking, is less credible than one with some or complete masking. These study design elements were discussed in more detail in this chapter, and randomization is the topic of Chapter 23.

Choosing the Comparison Group Comparison groups serve the important purpose of allowing an evaluation of what outcomes would result in the absence of the experimental or intervention condition for clinical studies, or, for example, in case-control studies how factors differentiate those who do and do not have the disease in question. In all studies, including case-control studies, it is important to describe carefully both the experimental (or case) and comparison (or control) groups. Comparison groups can be participants randomized to a placebo control, usual care, standard of care, attention control, or alternative treatment. In the latter case, if both treatment arms have demonstrated efficacy and/or are guideline based, such a study design can be called a comparative effectiveness trial, which as described above essentially compares two efficacious or effective treatments to each other. In all other designs, at least one of the study arms is an experimental arm, and one or more additional arms serve to control one or more factors. For every study, it is important to have a clear understanding of what factors should be controlled with the comparison group because this will typically dictate what conclusions can be drawn from the data. Given their importance to clinical study design and trial interpretation, several common control groups are described in the next section.

CONTROL GROUPS Intervention Development Intervention development strategies vary enormously according to the particular type of intervention being developed. For pharmacologic studies, intervention development is a long process that can take years and is quite costly, but occurs prior to any Phase I studies. For nonpharmacologic studies, intervention development is more likely (although not necessarily) to be iterative. In all cases, however, taking the time to systematically and empirically develop an effective and well-articulated intervention helps to ensure that future clinical trials are sound, and it decreases the opportunity for random error variance or weak interventions to move forward to large clinical trials. Adaptive treatment designs such as the Multiphase Optimization Strategy (MOST), SMART, and the Just-in-Time Adaptive Interventions designs are innovative new ways to develop and optimize behavioral interventions in an efficient manner.28e30 These methods were developed to design

One of the most complicated issues in clinical trial design is how to choose and design the most appropriate control group for a specific treatment and outcome.31 The purpose of the control group is to control potential threats to internal validity so the dependent variable(s) of interest, rather than any other nontreatment-related factors, can be said to be more likely associated with the active ingredient of the experimental treatment. The primary driving force for the choice of a control group should be the specific question being addressed; different control group conditions will allow different conclusions to be made. In other words, to choose the most appropriate control group mandates that one has a good grasp of what needs to be controlled in the experimental setting, including how the treatment of interest is defined and what the outcomes of interest are. Thus, there is no control group that can be said to be “correct” in all cases. Some of the factors that control groups are meant to control include the following: expectations

II. STUDY DESIGN AND BIOSTATISTICS

CONTROL GROUPS

(both patient and provider); time and attention (for example, from the provider, from groups in group interventions, as a result of measurement of outcomes or observation, or as a result of diagnosis); practitioner effects; social support (from practitioner and other sources); compensation (outside of potential treatmentrelated benefits); demand or burden; risk; disease progression; and nonspecific effects, including contextual effects. This last factor, context, is often thought of as that group of nonspecific effects that may come from environmental, social, and structural characteristics. Nonspecific effects are any effects of the complex treatment package on the outcomes of interest that are not due to the specific mechanisms being tested. For example, patients receiving treatment in an outpatient hospital setting might have particular responses unrelated to the treatment itself, and thus the contextual factor of where treatment is received would be important to control in such a circumstance. Control groups can take many different forms, from wait-list control, placebo control, sham control, time and attention control, and active comparator control groups. Because of the importance of choosing the most appropriate control group for any specific study, the advantages and disadvantages of each of these will be addressed.

Wait-List Control A wait-list control group is an unmasked, untreated group. In other words, participants in a wait-list control group are denied the experimental treatment but are aware that they are not receiving treatment. In such cases negative expectancies may occur; since there is no expectation for getting better sometimes patients do more poorly. This can then inflate the apparent effects of the experimental group. Wait-list groups really are not untreated because they are contacted, consented, randomized, diagnosed, and measured. For treatments that carry low expectancy for benefit and high risk, wait-list participants may actually fare better than those in the experimental group. Wait-list control groups do not represent a real natural history group. Wait-list groups, while providing some limited information particularly in early phase studies, are not sufficient for allowing definitive assessment of the clinical utility of a particular intervention strategy. One way that wait-list control groups can be most effectively utilized is in studies that employ additional control groups. In this case, information from a wait-list group is valuable because when used in conjunction with other control groups, it can allow some conclusions about effects due to natural history.

259

Time and Attention Control Time and attention control groups are most commonly employed for nonpharmacologic interventions and are meant to control contextual and nontreatment-related variance, so that any group differences can be ascribed to the active “ingredient” of the experimental treatment.32 Attention control groups are particularly useful in controlling effects of practitioner attention, as well as the attention and social support that might result for some nonpharmacological trials run in a group setting. Inclusion of a time and attention control group results in a conservative estimate of the effect size of the treatment, because many outcomes are significantly influenced by these nonspecific factors.

Placebo Control A placebo control group is typically considered to be one that receives an inert treatment. The conventional control group in a pharmacologic study is the placebo control because it is easy to mask both the experimenter and patient to condition, as long as the placebo is well matched on all key characteristics to the experimental drug. Thus, if the experimental drug is a pill, a good placebo pill will look, smell, and taste the same as the experimental drug and will have as many of the same side effects as possible (e.g., changes in urine color will be similar) while having no active ingredient. In addition, placebo controls should be given in the same setting as the experimental group, with the same dosing schedule, and be the same on all other sensory components and delivery of the treatment. For some patients and under some conditions, however, placebo responses occur and such responses may occur in as many as 30% or more of study participants. A variety of factors are widely known to influence the extent of placebo responding, including expectancy, social factors, and certain conditions such as chronic pain and Parkinson’s disease. For nonpharmacologic interventions, placebo control groups are more complex because there are domains other than sensory ones to be considered, and some of these domains would be expected to produce some effects in some patients under some conditions. The design of a placebo condition for such studies can be challenging because often these interventions cannot be easily masked. Interestingly, numerous factors can modify the extent of a placebo response. For treatments involving patiente practitioner interaction, Hawthorne effects (the phenomenon that individuals will change their behaviors when they believe that they are being observed) can be present. Expectation for success (or failure) of a treatment on the part of either the patient or provider can

II. STUDY DESIGN AND BIOSTATISTICS

260

18. CLINICAL TRIAL DESIGNS

impact placebo responding. Later in the chapter we will address placebo response in more detail.

Sham Control A sham control group is a group that experiences the same procedures as the experimental group, without the active portion of the intervention. Sham controls are most common among interventions involving devices. Just as with other control groups, sham controls reduce the potential for bias, particularly bias related to group differences in adherence and expectations. Sham controls also control treatment-related factors and procedures that are not considered to be active. For example, a sham control group for a surgical intervention would include all aspects of the surgery except whatever aspect is thought to be beneficial. Despite the benefits and strength of sham control groups, they are relatively infrequent for surgical interventions because they are invasive and share a similar degree of risk than the experimental group without benefit. However, they are somewhat more frequent for other procedurally based interventions when the risk of a sham procedure is low.

Usual and Standard Care Controls Usual care control groups (sometimes referred to as treatment as usual groups or standard care groups) are particularly relevant and important to clinical research. Studies using these control groups vary somewhat. All essentially consist of patient populations who are enrolled in a research study but who receive treatment in their usual care settings as they would if they were not enrolled in the study. Typically, the treatments that usual care control groups receive can be as varied as is typical clinical care. Closely related are enhanced usual care groups, which consist of some common and minimal level of care to address ethical issues, but which have some additional enhancements. These groups are often enhanced with elements of the active intervention that are considered to be inert or not of interest. Usual care control groups can be quite informative in effectiveness trials because they reflect what care is typically provided for a given condition and thus what occurs in real-world situations. However, there is tremendous variability in what usual care consists of; for example, usual care varies tremendously according to insurance status, socioeconomic status, and geographic region. Standard of care (in contrast to standard care) groups are somewhat less frequently employed as control groups. Standard of care groups can be thought of as groups that employ best practices, guideline based or the most commonly agreed on effective treatment for the condition of interest. It is a generally higher level and more

consistent level of care than usual care. These control groups require careful documentation and evaluation of the content of the control, since it may vary over time and across study sites. This variability may increase the needed sample size for studies using these types of control groups.

Multiple Control Groups Although the simplest of the classic designs consists of two treatmentsdintervention and control, often a placebo controldmany trials also include additional comparison groups. For example, a standard “positive control” that has previously been shown effective for that condition serves as a yardstick against which to compare the magnitude of the response produced by the experimental treatment. Without the positive control, a failure of the experimental treatment to produce a greater response than the placebo could render the study inconclusive. Although tempting to conclude that the treatment was ineffective, it is possible that the assessment instruments were insensitive, the procedures of the experimental observer were variable or confusing, the patient population has particularly high placebo response, or merely because of random variation. If a positive control were included and shown superior to both placebo and the experimental treatment, this would strengthen the conclusions, which could be drawn regarding the failure of the experimental treatment to have an effect. Alternatively, if all three arms produced similar responses, one could conclude that the study methods were inadequate to show the effects of even an efficacious treatment. In addition to a design testing an experimental treatment, a standard treatment, and a placebo, other multiarm designs without a placebo are also possible. Many clinical trials include additional treatment or control groups that are chosen to further elucidate the major research question. Other trials might include multiple control groups, each of which might control for different aspects of a multicomponent intervention. Alternatively, two or more doses of the same intervention could be compared to any control group, which would strengthen the causal inferences that could be made. A dosee response curve showing no or a small response with the control group and escalating responses with greater doses of the experimental intervention could convincingly demonstrate the positive benefit of the experimental treatment. Whatever the disease area of interest, one may wish to test the soundness of proposed research designs by graphing the possible outcomes of the trial. If the conclusion given a particular outcome is ambiguous, consider additional treatment groups that would distinguish among the alternative explanations. The addition of

II. STUDY DESIGN AND BIOSTATISTICS

PLACEBO RESPONSES

treatment or control groups is costly, however. One must either recruit more patients or reduce the size of each treatment group, lessening the statistical power of the comparisons. In many cases, particularly where negative results will not be of great interest, researchers may choose to omit controls whose main value is to clarify the interpretation of the negative result. In all cases, as with so many design decisions, which need to be made, the major factors that should drive the particular design of study arms should be the question being addressed and the end points and outcomes of interest.

PLACEBO RESPONSES Placebo, which means “I will please” in Latin, is a term applied to a presumably inactive intervention. The placebo response is the favorable response that the inactive treatment often elicits. Scientists and philosophers have wrestled with this concept for generations.33 For the purposes of clinical trial design an understanding of what constitutes a placebo group and how to understand responses of placebo groups is of special interest and importance.

Background Placebos clearly can affect subjective ratings of symptoms and function among both patients and clinicians. There is little debate about this phenomenon, and it has typically been ascribed to patient’s beliefs or expectations regarding their individual randomization in the active treatment group. In clinical research, such responses have been considered biased and primarily due to changes in symptom reporting rather than indications of positive physiological benefit. However, placebos also have been shown to influence a variety of physiological measurements, including blood pressure, airway resistance, neural functioning, and gastrointestinal motility.34,35 While various studies have shown levels of “placebo responding” among Parkinson’s patients,36,37 Alzheimer’s disease patients,38,39 and depressed and schizophrenic patients40 that were larger than expected due to spontaneous visit-to-visit fluctuations or due to the natural history of the disorder, other studies have examined the mechanisms for higher than typical placebo responses among these various patient groups. There are no simple answers. Initial reports suggesting that placebo analgesic responses after surgery can be reduced to endorphin secretion41 have been refuted by the finding that placebo analgesia is not reduced in magnitude by pretreating patients with large doses of naloxone.42 Placebo responses undoubtedly involve brain centers for language, sensation, mood, movement, and anticipation of the future, that

261

is, most of the brain and every bodily system are under its control.35 This has led to a rich and complicated literature.43 The recent psychopharmacology literature offers a revealing debate about placebo responses because in recent years large placebo effects in particularly highresponding patient populations have caused many trials of novel antidepressants and anxiolytics to fail. Some experts warn investigators to avoid psychotherapeutic intervention and to keep warm contact with the patient to the minimum needed to ensure patient compliance to counteract these effects, with the notion that placebo responses are not only due to expectations of benefit but also to patienteprovider interactions. To counteract the desire of the patient to please the practitioner with a positive report, it can be helpful to emphasize that the value of the experimental treatment is unknown. Sullivan44 explored the paradox that when clinical investigators dismiss the placebo response as a nuisance to be contained they impoverish scientific conceptions of healing because placebo responding is likely to be a part of all active treatments. An alternative view is that a better understanding of placebo responses will reveal “specific mechanisms” of the healing interaction, which may include cognition, expectations, environmental factors, and patienteprovider relationships impacting or interacting with an active pharmacological treatment. One important goal of the clinical investigator is to maximize the ratio of the specific treatment effect to the experimental variation. Large placebo responses work against this goal in two respects. First, the “specific treatment effect” is inferred to be the difference between improvement shown by study participants on the treatment and those on a placebo. To the extent that patient or practitioner expectations influence treatment, a portion of the treatment response can be considered a placebo responsedthat is, responding due to expectations rather than “active” treatment. In cases in which the placebo effect is large, a “ceiling effect” may limit the amount of incremental difference that can be seen with a specific treatment. Second, placebo responses, and the nature of the interaction between placebo and specific treatment responses, may vary greatly among individuals with different backgrounds, cognitive styles, expectations, and relationships with their health-care providers. Therefore, as the mean size of the placebo response increases, the experimental variance may increase, with corresponding loss of power.

Identifying Placebo Responders Several clinical investigators view placebo responses as nuisance variables because active treatment effects become difficult to differentiate from placebo effects. One response has been to identify high placebo

II. STUDY DESIGN AND BIOSTATISTICS

262

18. CLINICAL TRIAL DESIGNS

responders prior to randomization, and exclude them from clinical trials. This has resulted in mixed conclusions. In analgesic studies carried out in the early 1950s, several leading research teams concluded that they were unable to sort out such a subgroup45; given repeated single doses of placebo interspersed with doses of opioids, more than 80% of patients with surgical or cancer pain reported analgesia from at least one dose of placebo. In other disease areas, however, the quest to identify placebo responders has continued in the form of single-blind placebo run-in periods preceding randomization. Patients with MDDs tend to have higher than typical placebo responding. An analysis of several clinical trial cohorts of depressed patients46,47 identified similar responses of patients on both tricyclic antidepressants and placebo. Initial mood improvements that fluctuate and eventually relapse are common in both drugtreated and placebo-treated patients and are inferred to be placebo responses. In contrast, those showing steady improvements with onset after 2 weeks are virtually limited to the drug groups. These investigators have argued for using a short placebo run-in period to exclude patients with a marked placebo response and to stratify and statistically correct the outcomes of patients with lesser degrees of improvement during the run-in.48 Other psychiatric investigators consider placebo runins unhelpful.49,50 They object that this maneuver wastes time, is deceptive in intent, and does not work. They believe clinicians emit subliminal cues that the placebo run-in offers no real treatment, which dampen patients’ response, whereas a much larger placebo effect occurs at the time of the real randomization. Montgomery49 and Schweizer and Rickels51 propose the alternative of a longer baseline observation period to exclude patients with mild or rapidly cycling mood disorders. In a review of methods in irritable bowel syndrome trials, Hawkey52 points out another liability of placebo run-in periods in spontaneously fluctuating disorders. By excluding patients whose symptoms have decreased by chance during the run-in period, one tends to be left with patients whose symptoms may have worsened by chance. Other investigators have suggested that because placebo responses are less durable than specific therapeutic responses, lengthening trial duration might increase the treatment-placebo difference.47,53 However, lengthening a study increases the cost and potentially the number of dropouts. Moreover, some placebo responses are durable. A variety of major surgical procedures that later proved to be useless, including gastric freezing for duodenal ulcers and actual or sham internal mammary artery ligation for angina pectoris, were initially reported to improve or eliminate the pain in the majority of patients for 1 year after surgery. Both patient and clinician expectations contribute to the

placebo effect although are most certainly not the entirety of what makes up the placebo response.43 Many studies have shown that subjects who notice side effects after taking a pill will report more improvement than those who feel no side effects. There are two ways to minimize such bias. First, placebos should be as similar to active treatments on as many possible variables except for the presumed active ingredient in the experimental intervention. Second, investigators should strive to maximize the effectiveness of blinding procedures and assess if patients and practitioners can guess study assignment by the outcomes, appearance, taste, or side effects of the treatments.54 Assigning placebo control groups for nonpharmacological interventions is more challenging than for drug studies31 as described in the section on control groups.

MISTAKES AND MISCONCEPTIONS Not Looking at the CONSORT Statement Before, During, and After a Study The original Consolidated Standards of Reporting Trials statement was developed to improve the reporting and conduct of randomized clinical trials.55 Although the statement primarily consists of a checklist of essential elements to assist researchers report research findings consistently, careful consideration of the 25 items in the CONSORT checklist during the design phase of a future trial can be valuable. The included items refer to standards for reporting the title, introduction, methods, analyses, results discussion, and other important sections of a research report. Understanding these critical elements can help researchers ensure that the appropriate data are collected in the appropriate manner for reporting according to CONSORT guidelines. Although first published in 1996, a number of updated guidelines for specific trial designs have subsequently been published.56,57

Waiting Until the Large Definitive Study to Worry About the Details The focus of this chapter is primarily on the design of relatively small clinical trials; design and analytic issues related to large clinical trials are covered in several other chapters and books. Small and moderately sized clinical trials, although seldom definitive, are essential in establishing estimates of treatment effects and feasibility while identifying critical patient characteristics, outcomes, and components of the intervention. They usually are more practical to conduct than are larger-scale trials. Small and moderately sized clinical trials may be single or multisite trials. In all trial designs, but particularly in

II. STUDY DESIGN AND BIOSTATISTICS

MISTAKES AND MISCONCEPTIONS

trials with modest sample sizes, the design must carefully consider the estimated effect of the treatment and the estimated (nontreatment-related) variance. Thus, research design strategies for maximizing treatment effects and minimizing error variance, while important for all experimental research, are especially critical for small and moderately sized clinical trials with more limited sample sizes. Failing to Increase the Treatment Effect The treatment effects detected in a study may be increased in several ways including maximizing the treatment dose or intensity, choosing comparison or placebo interventions with minimal expected impact on outcomes, or choosing to study (and treat) patients most likely to be responsive and adhere to the intervention and least likely to be responsive to the control or comparison treatment. For example, certain patient populations are known to be particularly responsive to placebo conditions. Clinical studies of these populations, when the design includes a placebo group, must have a powerful intervention to discriminate the responses of the placebo control from those of the intervention groups. Failing to Decrease the Variance Variance due to treatment is otherwise termed treatment effects, but all other sources of variance decrease the ability to identify those treatment effects. There are several approaches to decreasing clinical trial variance. Significant variance can occur as a function of the experimental conduct of the study. Decreasing the variability in measuring the primary outcome is often a powerful and inexpensive way to stimulate the pace of therapeutic advance in an entire field. For example, careful consideration of the psychometric properties of the scales used in trials can help identify sound instruments and will result in outcomes with decreased error variance. Consideration of the primary outcomes may identify known fluctuations in those outcomes according to time of day, seasonality, and other variations that are not of interest but which will necessarily increase measurement variance. Blood pressure, for example, demonstrates a daily rhythm, and for studies including blood pressure as an important outcome, the time of day in which blood pressure is taken (as well as many other variables such as posture, instrumentation, practitioner effects, etc.) will influence nontreatment variance. Variations also may occur normally for some outcomes such as pain, which can fluctuate widely day to day in chronic pain patients. Similarly, certain conditions are associated with relapsing and remitting symptoms. For these types of outcomes especially, repeated measurement can often decrease variability and enhance the ability to identify treatment effects.

263

In the sample size formula (see Chapter 25) only unexplained sources of variation contribute to the variance, s2. If there are adequate data to identify the predictors of the outcomes of interest and those predictors can be measured, those components can be removed from this unexplained error term. From a design perspective, removing these components is accomplished by controlling the sources of variance in one or more control groups or by careful selection of patient characteristics. This also sometimes can be accomplished analytically. For example, Jung and colleagues58 reported that rash duration, age, sex, the presence of a prodrome, and the severity of pain and acute rash explain 23% of the variance in the occurrence of postherpetic neuralgia. Covariates such as surgical trauma and prior opioid exposure have been reported to improve the sensitivity of analgesic clinical trials.59 Assessment of genetic polymorphisms that affect individuals’ treatment response offer promise in explaining part of the outcome variance in many disease areas.60,61 Since sample size is proportional to variance, using this type of additional knowledge could reduce the size of a study or at least lead to a more specific study design.

Not Taking Care When Choosing a Control Group As with many aspects of study design, the question of determining the appropriate control group depends on the question being addressed, available resources, the outcomes of interest, and where the state of the sciences lies on the research continuum. For example, for dissemination and implementation studies, which address questions of how a treatment can be disseminated to real-world settings (community, worksites, schools, community-based clinics), control groups may be formed to control factors related to the dissemination or implementation plan rather than the intervention itself. Such studies may compare treatments disseminated by specifically trained community health workers (experimental group) to those provided by hospital staff. For late-stage effectiveness or pragmatic trials, control groups may be primarily usual care groups, because the focus of the question is whether the intervention is effective in the community setting, which puts little or no constraints on the patient entry criteria, the way the intervention is practiced, or strategies to enhance adherence or retention. The primary focus in designing the control group is to have a good understanding of the factors one wishes to control. This requires a concise articulation of the question to be answered and the most critical outcomes and end points to be measured. In some cases, the primary question is related to the mechanisms of action of the experimental treatment. In these cases, the

II. STUDY DESIGN AND BIOSTATISTICS

264

18. CLINICAL TRIAL DESIGNS

hypothesis to be tested is in regard to if and how the experimental treatment effects change, the outcomes are efficacy and mechanistic, and the control group should be designed to control all other presumably “nonactive” elements of the treatment. In other cases, the primary question is whether or not a new treatment effects any change through any means. In such cases, the question is whether a signal for efficacy may be apparent and the outcomes are broad. Here a usual care or waitlist control group may be appropriate. It is a common mistake to adhere in all studies to a single “correct” control group. While a researcher may certainly employ a control group that is inadequate for the question being addressed, the problem is a lack of harmony between the question and the control group rather than the choice of control group itself.

Always Assuming Placebo Groups Are Unethical It is sometimes argued that the inclusion of a placebo control when an effective treatment is available is unethical62 and that the more appropriate design should compare new treatments to a standard treatment. The standard treatment typically is defined as one of two things: standard of care, the optimal available treatment for a given condition; or standard care, how the typical treatment is given for that condition in real-world practice. Although this may be true in cases in which withholding the known treatment poses major risks of irreversible harm (e.g., studies of treatments of aggressive cancers, serious infections, or any condition for which withholding immediate effective treatment causes permanent damage), for other cases the inclusion of a placebo condition is justifiable and necessary to advance clinical science. The exclusion of placebo groups in all conditions where a treatment exists could impair the early development of many treatments, when proof of principle for a weak treatment is needed to continue efforts to improve the treatment.63e66 This is especially the case when known treatments do not have proven effectiveness but are efficacious in small, well-controlled trials only. Enrollment in a placebo arm is also acceptable when the trial is relatively short in duration, withholding a treatment does not cause permanent harm or excess distress, and when existing treatments are not well accepted by patients. The best case for the inclusion of a placebo group is when comparing a new treatment to a standard treatment, where the addition of a placebo arm can help to clarify the possibility that neither active treatment was effective in that particular trial and that natural history nor placebo effects explain the results. This is particularly the case if early stage research in the area failed to include placebo controls. As discussed previously, a

study that excludes a placebo group may produce spurious evidence for the new drug’s efficacy and lead to widespread use of an ineffective medication. Many variations of when and how to use placebo control groups have been well articulated.67

Assuming Placebo Treatment Is (Im)Possible in Long-Term Studies In short-term investigations of treatments focused on patient symptoms, placebos are often ethically justified because patients understand that they can terminate the study and take additional medication at any time.68 In actual practice, many patients experience some placebo relief, and most tolerate the study for a period of time. Chronic disease studies can be more difficult, however, for both practical and ethical reasons. Patients have a more difficult time tolerating unrelieved severe symptoms for a long period of time, and this is especially the case if effective or efficacious treatments exist. This can lead to potential differential dropout rates between patients in the placebo and intervention arms, and make it extremely difficult to make causal inferences. Ethical issues also are a concern for placebo studies, particularly if a potentially efficacious treatment is available. For certain conditions that progress without treatment, it is obvious that researchers cannot ethically give a placebo alone if it could cause permanent harm. Therefore, in these situations, one ethically feasible way to conduct placebo-controlled studies is to give both placebo and active treatment groups as an add-on treatment to patients already on optimal doses of a standard treatment.

Confusing Placebo Response and Regression to the Mean To what extent can we differentiate placebo responding from high responses changing due to the phenomenon of regression to the mean? In two large placebo-controlled doseeresponse studies of irbesartan,68 an antihypertensive, diastolic pressure initially dropped by a mean of 4 mmHg in patients treated with placebo capsules and 5e10 mm in patients treated with irbesartan. Was the 4 mm drop a “placebo response?” A plausible alternative explanation was that this improvement reflects the phenomenon of “regression to the mean.” In chronic disorders with fluctuating symptoms and signs, patients are more likely to volunteer for studies and qualify for entry when their disease is in a worse period. Conversely, after study entry, there will be a tendency for them to improve just by random variation. (See Chapter 27 for more details.) One way to distinguish a

II. STUDY DESIGN AND BIOSTATISTICS

MISTAKES AND MISCONCEPTIONS

placebo response from regression to the mean is to observe the outcome over a longer period of time and after treatment (including placebo) has ceased. A subsequent increase in the outcome would suggest the response was due to patients’ expectations of a drug effect during treatment. Another albeit not perfect way to distinguish placebo response from regression to the mean is to include a no treatment group or wait-list control group as well as a placebo group. One may infer that improvement in the no treatment group is regression to the mean, and the additional improvement in the placebo group is the placebo response. Hrobjartsson and Gotzche69 used this strategy to measure placebo responses in 156 published clinical trials that included both a placebo group and a no treatment group. They concluded that most or all of what is commonly considered placebo response is really regression to the mean, except perhaps in studies of pain, anxiety, and other outcomes reported by the subject. This finding fails to explain the physiological and neurological changes that occur with placebos, however.

Using a Factorial or Partial Factorial Design Instead of a Parallel Group Design One of the advantages of using a factorial design is savings in sample size. Chapter 25 discusses power and sample size in general, but there are situations when factorial or partial factorial designs do not allow smaller sample sizes relative to a parallel group design. As mentioned earlier, there are several assumptions that come with the sample size savings of a factorial design. One is that each of the interventions has an independent effect on the outcomes measured in the trial, which is not always true. In fact, one of the study goals may be to investigate the interaction of the treatments under study. Studies wishing to look at potential interactions of the treatments will need to be powered to do so, which will increase the sample size, sometimes (but not always) to the size of a parallel group design. One alternative is to consider each possible treatment combination as a separate arm in a parallel group design, although there are other methods that can be used, such as the MOST and others that allow smaller sample sizes for partial factorial designs even when postulating and testing interaction effects.70

Assuming Small, Open-Label, Nonrandomized, Uncontrolled Studies Offer No Evidence The ISIS-4 study,23 a multisite randomized study with approximately 58,000 participants was discussed in the factorial design section. Another study was alluded to in the early Phase II design section of this chapter. Several

265

years ago, the intramural research program of the National Institute of Neurological Disorders and Stroke (NINDS) conducted a series of studies to evaluate images of contrast-enhanced lesions as a measure of disease activity in early RRMS. The contrasting agent gadolinium causes areas of bloodebrain barrier breakdown to appear on MRI images as bright spots or lesions. Traditional clinical measures of disease activity, such as those based on assessing physical or mental disability, are known to be very insensitive during the early phase of the disease. By comparison, it is thought that the number and area of these lesions as measured by serial monthly MRI images may be a more sensitive measure of disease activity during this phase.71,72 A series of Phase II (safety/ efficacy) studies were conducted at NINDS to screen new agents, including beta-interferon, for effectiveness. One study examined the effect of beta-interferon on lesion activity during the early phase of RRMS.73,74 The beta-interferon study was designed to have 14 patients followed for 13 months. Patients remained untreated during the first 7 months (seven serial MRI images) and then were treated with beta-interferon during the last 6 months (six serial MRI measurements). The primary outcome or response in this study was the average monthly number of lesions on treatment minus the corresponding average number during the untreated baseline period. The study results showed that beta-interferon significantly reduced the number of lesions compared to baseline. This study is a nonrandomized study in which all patients were switched over to the investigational treatment after 6 months. This type of nonrandomized design is one of many used to screen for new therapeutic agents. The intramural research program of NINDS also conducted the previously mentioned clinical trial to study the efficacy of felbamate monotherapy for the treatment of intractable partial epilepsy.24 The patients in this study had partial and secondary generalized seizures and were undergoing presurgical monitoring. The effectiveness of felbamate monotherapy was compared to that of a placebo. Forty patients were randomized to either felbamate (n ¼ 19) or placebo (n ¼ 20) and followed in the clinical center for 2 weeks. The patients’ numbers and types of seizures were recorded daily for 2 weeks. The primary outcome of this study was daily seizure rates for patients on treatment or placebo. The study results showed that felbamate monotherapy significantly reduced the number of seizures compared to the placebo. This type of randomized design is often used to test promising new treatments for efficacy. The diverse study designs of these examples illustrate fundamental issues. The study evaluating the effect of beta-interferon is a nonrandomized study, whereas the felbamate monotherapy trial and the ISIS-4 trials are randomized clinical trials. Their designs varied as did

II. STUDY DESIGN AND BIOSTATISTICS

266

18. CLINICAL TRIAL DESIGNS

their control groups and sample sizes. In each study the investigators wished to determine whether a given treatment or treatments were effective in the care of patients with a specific disease or medical risk. Each study was conducted because the investigators wanted to be able to treat not only the patients in that study but also all patients with similar characteristics, diseases, and medical risks. Each study, though, was started at a different point in the research continuum. The goal of introducing these and other study examples is to be clear that not all studies are right at every single time. Different design elements are needed at different points. At times information and evidence can be borrowed from different populations or fields, but research must start somewhere.

3. Quasiexperimental designs a. Are one of the most powerful design strategies used in clinical trials b. Lack randomization, a control group, and an independent variable c. Can be used when randomization is not possible, such as in naturalistic settings d. Typically cannot lead to strong causal inferences

Acknowledgments We thank Jack M. Guralnik, Teri A. Manolio, Paul S. Albert, Craig B. Borkowf, and the late Mitchell B. Max who contributed material to the previous editions of this textbook that informed this chapter.

Disclosures CONCLUSIONS The goal of this chapter has been to outline the essential questions researchers must address in choosing a clinical research design and designing a control group for a clinical trial. The most fundamental of these questions is carefully considering the specific goal(s) of the study in terms of the study questions being addressed. Understanding the components of the new treatment or intervention that are considered to be active, identifying the outcomes of particular interest, and careful consideration of how the study fits in the context of the body of existing knowledge in the area are fundamental elements of optimal clinical trial design and evidence-based research.

SUMMARY QUESTIONS 1. What might be considered to be the first stage in developing an appropriate design and analytic strategy for a research study? a. Developing an effective and feasible intervention strategy b. Articulating the precise research question to be addressed c. Locating an appropriate patient population of interest d. Conducting a dose-ranging study 2. What type of control group is the most appropriate to use in most clinical trials? a. Either a placebo or wait-list control group b. A usual care control group c. A time and attention control group d. There is not a single correct answer to this question because the most appropriate control group depends on the question and outcomes of interest

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health (NIH) or the US government. This chapter reflects the views of the author and should not be construed to represent FDA’s views or policies.

References 1. Evans SR. Clinical trial structures. J Exp Stoke Transl Med 2010;3: 8e18. 2. Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, Trevisan M, Black HR, Heckbert SR, Detrano R, Strickland OL, Wong ND, Crouse JR, Stein E, Cushman M, Women’s Health Initiative Investigators. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med 2003;349:523e34. 3. International Conference on Harmonisation. Choice of control group and related issues in clinical trials. Fed Regist 2001;66. 4. Korn EL. Nontoxicity endpoints in phase I trial designs for targeted, non-cytotoxic agents. J Natl Cancer Inst 2004;96:977e8. 5. Parulekar WR, Eisenhauer EA. Phase I trial design for solid tumor studies of targeted, non-cytotoxic agents: theory and practice. J Natl Cancer Inst 2004;96:990e7. 6. Hunsberger S, Rubinstein LV, Dancey J, Korn EL. Dose escalation trial designs based on a molecularly targeted endpoint. Stat Med 2005;24:2171e81. 7. Thall PF, Cook JD. Dose-finding based on efficacy-toxicity tradeoffs. Biometrics 2004;60:684e93. 8. Thall PF, Simon R. Incorporating historical control data in planning phase II clinical trials. Stat Med 1990;9:215e28. 9. Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989;10:1e10. 10. Simon R, Wittes RE, Ellenberg SS. Randomized phase II clinical trials. Cancer Treat Rep 1985;69:1375e81. 11. Steinberg SM, Venzon DJ. Early selection in a randomized phase II clinical trial. Stat Med 2002;21:1711e26. 12. Green S, Benedetti J, Crowley J, et al. Clinical trials in oncology. Data monitoring committees for Southwest Oncology Group clinical trials. 2nd ed. 2002. 13. Sox HC, Greenfield S. Comparative effectiveness research: a report from the Institute of Medicine. Ann Intern Med 2009;151: 203e5. 14. Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutical trials. J Chronic Dis 1967;20:637e48.

II. STUDY DESIGN AND BIOSTATISTICS

REFERENCES

15. Thorpe KE, Zwarenstein M, Oxman AD, Treweek S, Furberg CD, Altman DG, Tunis S, Bergel E, Harvey I, Magid DJ, Chalkidou K. A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers. J Clin Epidemiol 2009;62:464e75. 16. Loudon K, Treweek S, Sullivan F, Donnan P, Thorpe KE, Zwarenstein M. The PRECIS-2 tool: designing trials that are fit for purpose. BMJ 2015;350:h2147. 17. Campbell DT, Stanley JC. Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin Co.; 1963. 18. Shadish WR, Cook TD, Campbell DT. Experimental and quasiexperimental designs for research. Boston: Houghton Mifflin; 2002. 19. Trevisan M, Jossa F, Farinaro KV, Panico S, Giumetti D, Mancini M. Earthquake and coronary heart disease risk factors: a longitudinal study. Am J Epidemiol 1991;135:632e7. 20. Lipton RB, Bigal ME, Stewart WF. Clinical trials of acute treatments for migraine including multiple attack studies of pain, disability, and health-related quality of life. Neurology 2005;65(12 Suppl. 4): S50e8. 21. Byas-Smith MG, Max MB, Muir J, Kingman A. Transdermal clonidine compared to placebo in painful diabetic neuropathy using a twostage ‘enriched enrollment’ design. Pain 1995;60:267e74. 22. Temple RJ. Special study designs: early escape, enrichment, studies in non-responders. Commun Stat Theor Methods 1994;23: 499e531. 23. ISIS-4 (Fourth International Study of Infarct Survival) Collaborative Group. ISIS-4: a randomised factorial trial assessing early oral captopril, oral mononitrate, and intravenous magnesium sulphate in 58,050 patients with suspected acute myocardial infarction. Lancet 1995;345:669e85. 24. Theodore WH, Albert P, Stertz B, et al. Felbamate monotherapy: implications for antiepileptic drug development. Epilepsia 1995; 36:1105e10. 25. Friedman LM, Furberg CD, DeMets DL. Fundamentals of clinical trials. New York: Springer; 2015. 26. Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am J Public Health 2004;94:423e32. 27. Lagoa CM, Bekiroglu K, Lanza S, Murphy SA. Designing adaptive intensive interventions using methods from engineering. J Consult Clin Psychol 2014;82:868e78. 28. Almirall D, Nahum-Shani I, Sherwood NE, Murphy SA. Introduction to SMART designs for the development of adaptive interventions: with application to weight loss research. Transl Behav Med 2014;4: 260e74. 29. Collins LM, Murphy SA, Strecher V. The multiphase optimization strategy (MOST) and the sequential multiple assignment randomized trial (SMART): new methods for more potent eHealth interventions. Am J Prev Med 2007;32(5 Suppl.):S112e8. 30. Nahum-Shani I, Smith SM, Spring BJ, Witkiewitz K, Tewari A, Murphy SA. Just- in-time adaptive interventions (JITAIs) in mobile health: key components and design principles for ongoing health behavior support. Ann Behav Med 2016. http://dx.doi.org/ 10.1007/s12160-016-9830-8. 31. Mohr DC, Spring B, Freedland KE, Beckner V, Arean P, Hollon SD, Ockene J, Kaplan R. The selection and design of control conditions for randomized controlled trials of psychological interventions. Psychother Psychosom 2009;78:275e84. 32. Freedland KE, Mohr DC, Davidson KW, Schwartz JE. Usual and unusual care: existing practice control groups in randomized controlled trials of behavioral interventions. Psychosom Med 2011; 73:323e35. 33. White L, Tursky B, Schwartz GE, editors. Placebo: theory, research, and mechanism. New York: Guildford; 1985.

267

34. Soiro HM. Doctors, patients, and placebos. New Haven (CT): Yale University Press; 1986. 35. Benedetti G, Carlino E, Pollo A. How placebos change the patient’s brain. Neuropsychopharmacology 2011;36:339e54. 36. McRae C, Cherin E, Yamazaki TG, Diem G, Vo AH, Russell D, Ellgring JH, Fahn S, Greene P, Dillon S, Winfield H, Bjugstad KB, Freed CR. Effects of perceived treatment on quality of life and medical outcomes in a double-blind placebo surgery trial. Arch Gen Psychiatry 2004;61:412e20. 37. Shetty N, Friedman JH, Kieburtz K, Marshall FJ, Oakes D. The placebo response in Parkinson’s disease. Parkinson Study Group. Clin Neuropharmacol 1999;22:207e12. 38. Spencer CM, Noble S. Rivastigmine. A review of its use in Alzheimer’s disease. Drugs Aging 1998;13:391e411. 39. Kawas CH, Clark CM, Farlow MR, Knopman DS, Marson D, Morris JC, Thal LJ, Whitehouse PJ. Clinical trials in Alzheimer disease: debate on the use of placebo controls. Alzheimer Dis Assoc Disord 1999;13:124e38. 40. Montgomery SA. The failure of placebo-controlled studies: ECNP consensus meeting. Eur Neuropsychopharmacol 1999;9:271. 41. Levine JD, Gordon NC, Fields HL. The mechanism of placebo analgesia. Lancet 1978;2:654e7. 42. Gracely RH, Dunbar R, Wolskee PJ, Deeter WR. Placebo and naloxone can alter post-surgical pain by separate mechanisms. Nature 1983;306:264e5. 43. Guess RH, Kleinman A, Kusek JW. The science of the placebo. London: BMJ Books; 2002. 44. Sullivan MD. Placebo controls and epistemic control in orthodox medicine. J Med Philos 1993;18:213e31. 45. Houde RW, Beaver WT. Clinical measurement of pain. New York: Academic Press; 1965. 46. Nierenberg AA, Quitkin FM, Kremer C, Keller MB, Thase ME. Placebo-controlled continuation treatment with mirtazapine: acute pattern of response predicts relapse. Neuropsychopharmacology 2004;29:1012e8. 47. Quitkin FM, Stewart JW, McGarth PJ, Nunes E, Ocepek-Welikson K, Tricamo E, Rabkin JG, Klein DF. Further evidence that a placebo response to antidepressants can be identified. Am J Psychiatry 1993; 150:566e70. 48. Quitkin FM, McGrath PJ, Stewart JW, Ocepek-Welikson K, Taylor BP, Nunes E, Delivannides D, Agosti V, Donovan SJ, Ross D, Petkova E, Klein DF. Placebo run-in period in studies of depressive disorders. Clinical, heuristic and research implications. Br J Psychiatry 1998;173:242e8. 49. Montgomery SA. Alternatives to placebo-controlled trials in psychiatry. In: ECNP consensus meeting, September 26, 1996, Amsterdam. European College of Neuropsychopharmacology. Eur Neuropsychopharmacol, vol. 9; 1999. p. 265e9. 50. Trivedi M, Rush J. Does a placebo run-in or a placebo treatment cell affect the efficacy of antidepressant medications? Neuropsychopharmacology 1994;11:33e43. 51. Schweizer E, Rickels K. Placebo response in generalized anxiety: its effect on the outcome of clinical trials. J Clin Psychiatry 1997; 58(Suppl. 11):30e8. 52. Hawkey CJ. Irritable bowel syndrome clinical trial design: future needs. Am J Med 1999;107:98Se102S. 53. Spiller RC. Problems and challenges in the design of irritable bowel syndrome clinical trials: experience from published trials. Am J Med 1999;107:91Se7S. 54. Moscucci M, Byrne L, Weintraub M, Cox C. Blinding, unblinding, and the placebo effect: an analysis of patients’ guesses of treatment assignment in a double-blind clinical trial. Clin Pharmacol Ther 1987;41:259e65.

II. STUDY DESIGN AND BIOSTATISTICS

268

18. CLINICAL TRIAL DESIGNS

55. Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup DF. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 1996;276:637e9. 56. Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c869. 57. Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. Trials 2010;11:32e57. 58. Jung BF, Johnson RW, Griffin DRJ, Dworkin RH. Risk factors for postherpetic neuralgia in patients with herpes zoster. Neurology 2004;62:1545e51. 59. Max MB, Portenoy RK, Laska EM. The design of analgesic clinical trials. Advances in pain research and therapy, vol. 18. New York: Raven Press; 1991. 60. Askmalm MS, Carstensen J, Nordenskjold B, Olsson B, Rutqvist LE, Skoog L, Sta˚l O. Mutation and accumulation of p53 related to results of adjuvant therapy of postmenopausal breast cancer patients. Acta Oncol 2004;43:235e44. 61. Lee DK, Currie GP, Hall IP, Lima JJ, Lipworth BJ. The arginine-16 beta2-adrenoceptor polymorphism predisposes to bronchoprotective subsensitivity in patients treated with formoterol and salmeterol. Br J Clin Pharmacol 2004;57:68e75. 62. Rothman KJ, Michels KB. The continuing unethical use of placebo controls. N Engl J Med 1994;331:394e8. 63. Charney DS, Nemeroff CB, Lewis L, Laden SK, Gorman JM, Laska EM, Borenstein M, Bowden CL, Caplan A, Emslie GJ, Evans DL, Geller B, Grabowski LE, Herson J, Kalin NH, Keck Jr PE, Kirsch I, Krishnan KR, Kupfer DJ, Makuch RW, Miller FG, Pardes H, Post R, Reynolds MM, Roberts L, Rosenbaum JF, Rosenstein DL, Rubinow DR, Rush AJ, Ryan ND, Sachs GS, Schatzberg AF, Solomon S. National Depressive and Manic-Depressive Association consensus statement on the use of placebo in clinical trials of mood disorders. Arch Gen Psychiatry 2002;59:262e70. 64. Fleischhacker WW, Czobor P, Hummer M, Kemmler G, Kohnen R, Volavka J. Placebo or active control trials of antipsychotic drugs? Arch Gen Psychiatry 2003;60:458e64. 65. Loder E, Goldstein R, Biondi D. Placebo effects in oral triptan trials: the scientific and ethical rationale for continued use of placebo controls. Cephalalgia 2005;25:124e31.

66. Temple RJ. When are clinical trials of a given agent vs. placebo no longer appropriate or feasible? Control Clin Trials 1997;18:613e20. 67. Miller FG, Shorr AF. Unnecessary use of placebo controls: the case of asthma clinical trials. Arch Intern Med 2002;162:1673e7. 68. Pool JL, Guthrie RM, Littlejohn TW. Dose-related antihypertensive effects of irbesartan in patients with mild-to-moderate hypertension. Am J Hypertens 1998;11:462e70. 69. Hrobjartsson A, Gotzsche PC. Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. N Engl J Med 2001;344:1594e602. 70. Collins LM, Baker TB, Mermelstein RJ, Piper ME, Jorenby DE, Smith SS, Christiansen BA, Schlam TR, Cook JW, Fiore MC. The multiphase optimization strategy for engineering effective tobacco use interventions. Ann Behav Med 2011;41:208e26. 71. Albert PS, McFarland HF, Smith ME, Frank JA. Time series for modelling counts from a relapsing-remitting disease: application to modelling disease activity in multiple sclerosis. Stat Med 1994; 13:453e66. 72. McFarland HF, Frank JA, Albert PS, Smith ME, Martin R, Harris JO, Patronas N, Maloni H, McFarlin DE. Using gadolinium-enhanced magnetic resonance imaging lesions to monitor disease activity in multiple sclerosis. Ann Neurol 1992;32:758e66. 73. Stone LA, Frank JA, Albert PS, Bash C, Smith ME, Maloni H, McFarland HF. The effect of interferon-beta on blood-brain barrier disruptions demonstrated by contrast-enhanced magnetic resonance imaging in relapsing-remitting multiple sclerosis. Ann Neurol 1995;37:611e9. 74. Stone LA, Frank JA, Albert PS, Bash CN, Calabresi PA, Maloni H, McFarland HF. Characterization of MRI response to treatment with interferon beta-1b: contrast-enhancing MRI lesion frequency as a primary outcome measure. Neurology 1997;49:862e9.

Further Reading 1. Levine RJ. The need to revise the Declaration of Helsinki. N Engl J Med 1999;341:531e4. 2. Max MB, Schafer SC, Culnane M, Smoller B, Dubner R, Gracely RH. Amitriptyline, but not lorazepam, relieves postherpetic neuralgia. Neurology 1988;38:1427e32. 3. Max MB, Schafer SC, Culnane M, Dubner R, Gracely RH. Association of pain relief with drug side effects in postherpetic neuralgia: a single-dose study of clonidine, codeine, ibuprofen, and placebo. Clin Pharmacol Ther 1988;43:363e71.

II. STUDY DESIGN AND BIOSTATISTICS

C H A P T E R

19 The Role of Comparative Effectiveness Research Joe V. Selby, Evelyn P. Whitlock, Kelly S. Sherman, Jean R. Slutsky Patient-Centered Outcomes Research Institute (PCORI), Washington, DC, United States

O U T L I N E Introduction

The Role of Engagement in Specifying Research Questions

269

A History of Comparative Clinical Effectiveness Research

270

The Patient-Centered Outcomes Research Institute 271 The Role of Comparative Clinical Effectiveness Research in the Nation’s Medical Research Enterprise The Methods of Comparative Clinical Effectiveness Research Getting the Research Question Right Choosing the Study Population Selecting Appropriate Interventions and Comparator(s) Choosing Clinical Outcomes to Be Measured

273 275 275 276 276 277

INTRODUCTION Individuals making decisions about their health and health care face many choices. So do others involved directly (i.e., caregivers, family members, clinicians) or indirectly (i.e., delivery systems, employers, health plans, policy-makers) in these decisions. Choices are the result of continuous efforts to improve the prevention, detection, and treatment of illness through basic science and clinical research, system-level interventions to improve health-care delivery and the public’s health, and marketing initiatives to promote new products and services. Alternative health-care choices may differ in

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00019-8

277

Study Designs for CER Studies Experimental Study Designs for CER Observational Study Designs for CER Cohort Designs Adjusting for and Avoiding Confounding in Observational CER Studies Assessing Treatment Heterogeneity

278 279 281 282

Evidence Synthesis in CER

285

283 284

Building a National Infrastructure for the Conduct of Comparative Effectiveness Research 287 Conclusions

290

References

290

effectiveness, in safety, in side-effects, costs, or convenience. Comparing viable alternative approaches using scientific methods to determine whether one leads to better results (i.e., a preponderance of desirable over undesirable outcomes) is an important activity of clinical and health services research. Once comparative information is generated and disseminated, individual patients may consider it, often in partnership with a clinician, apply their personal values or preferences to trade-offs between desirable and undesirable outcomes, and make informed decisions. Health systems, payers, and policy-makers may use the comparative information to make decisions at system and population levels.

269

Copyright © 2018. Published by Elsevier Inc.

270

19. COMPARATIVE EFFECTIVENESS

A HISTORY OF COMPARATIVE CLINICAL EFFECTIVENESS RESEARCH Comparative clinical effectiveness research (CER) is the scientific search for and quantification of differences in the full range of benefits and harms of two or more approaches to preventing, diagnosing, treating, or managing disease. CER is not a new activity. Published articles in which the term “comparative effectiveness” appears in the title are found among the earliest entries in PubMed (Fig. 19.1). Growth in the number of such publications has accelerated rapidly over the past 10 years. The closely related concept of “pragmatic” research was first defined in 1967 as clinical research intended to support decision-making and distinguished from “explanatory” research, which is intended to build knowledge alone.1 CER is pragmatic research in that it aims to generate evidence that can support decisionmaking and improve health care. Research questions are related not to whether or how an intervention works in an ideal situation but to whether interventions work in routine clinical settings and in the broad range of patients found in real-world clinical practice. In 2003, Tunis et al.2 described the “practical” clinical study as one that compares alternatives that are meaningful to decisionmakers, in typical patient populations and settings, that proactively engages decision-makers in generating the research questions, and that considers a wider range of outcomes. Publication of this paper coincided with the passage of the Medicare Prescription Drug, Improvement, and Modernization Act (also called the

Medicare Modernization Act or MMA) of 2003, which included Section 1013 authorizing up to $50 million for the Agency for Healthcare Research and Quality (AHRQ) to conduct a program of comparative clinical effectiveness research. Interest in CER grew rapidly in the United States during the first decade of this century, with the allocation of $1.1 billion through The American Recovery and Reinvestment Act of 2009 (ARRA, or the “stimulus package”) to fund CER across the federal government and the generation of the top 100 CER topics by the Institute of Medicine (IOM) (now the National Academy of Medicine) in 2009 (see Text Box for details). CER was increasingly seen as an important tool for informing choices made by patients and others, for improving the quality of decision-making in the face of possible harms as well as benefits and, importantly, for reducing ineffective or wasteful care, thereby helping to control or lower the costs of care while improving patient outcomes. By 2006, the possibility of establishing a national institute that would fund, conduct, and synthesize comparative effectiveness research was being proposed by persons within or close to the US government.3,4 Concerns that any concentrated CER effort could fall prey to inconsistent funding appropriations and political discord if placed within the federal government increased support for establishing an independent institute with mandatory funding charged with a broad CER mandate. In 2010, the Patient-Centered Outcomes Research Institute (PCORI) was created as the independent entity as part of the Patient Protection and Affordability Act.

Number of 'Comparative Effectiveness' Articles in PubMed over Time Comparative Effectiveness Articles per 100,000

50 45 40 35 30 25 20 15 10 5 0 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year

FIGURE 19.1 Number of articles in PubMed with “comparative effectiveness” in the title per 100,000 total published articles, 1950e2015.

II. STUDY DESIGN AND BIOSTATISTICS

THE PATIENT-CENTERED OUTCOMES RESEARCH INSTITUTE

A BRIEF HISTORY OF EVENTS LEADING TO CREATION OF THE PATIENT-CENTERED OUTCOMES RESEARCH INSTITUTE 2003 The Medicare Prescription Drug, Improvement, and Modernization Act of 2003 (MMA), Section 1013, authorized up to $50M for the AHRQ to conduct outcomes research and evidence synthesis for 10 high-priority chronic conditions through its Evidence-based Practice Centers and its newly formed Developing Evidence to Inform Decisions about Effectiveness (DEcIDE) Network. 2006 Wilensky3 discussed a range of options for creating and housing a national center for the conduct and synthesis of comparative effectiveness research. 2007 Congressional Budget Office4 proposed options for creating a center of comparative effectiveness research, funding the center, and using the CER results. 2009 The ARRA economic stimulus allocated $1.1 billion specifically for CER to the AHRQ ($300 million), the National Institutes of Health (NIH, $400 million), and the Department of Health and Human Services (DHHS, $300 million).5 Funds were dedicated to funding large CER studies,6 improving the quality and availability of national data systems for conducting CER,7,8 creation of sustainable disease-specific registries,9 and workforce training.5 ARRA also funded the IOM (now the National Academy of Medicine) to develop an initial set of high-priority CER topics.10 The committee engaged stakeholders via a website and day-long public meeting to collect nominations for topics that could be addressed using CER and produced a set of 100 high-priority topics across a wide range of conditions, grouped into four-ranked quartiles. The topics included general questions related to prevention, diagnosis, treatment, and monitoring of a wide range of conditions; fully half of the 100 topics, research topics focused on interventions at the health-care system level.

Numerous definitions of CER were published during this period (Table 19.1) reflecting a very high degree of interest in the evolving concept. All definitions agree that CER’s purpose is to support decision-making by patients as well as other stakeholders, and all suggest that the CER approach may be applied to both clinical and health system interventions. Most specify that CER compares a broad range of interventions (not simply pharmaceuticals) and also can be applied to prevention, diagnosis, and monitoring. Some definitions mention that CER takes place in real-world settings and diverse populations, stressing that it is “effectiveness” and not efficacy research. At least two definitions state that CER is conducted in response to the “expressed needs” of decision-makers, pointing researchers toward

271

engaging with the ultimate users of the research before and during its conduct. Several definitions note that CER should attend to differences in comparative effectiveness of interventions between patient subgroups or care settings. That is, CER seeks to learn “what works best for whom.” Several definitions clarify that CER includes randomized, controlled trials as well as observational studies. Just one definition includes costeffectiveness; the remainder restricts the definition to assessment of comparative clinical effectiveness. We take the approach of not including cost-effectiveness analysis in this chapter. With good clinical effectiveness information, others can apply locally relevant costs, fill in remaining assumptions, and model costeffectiveness.

THE PATIENT-CENTERED OUTCOMES RESEARCH INSTITUTE As part of the Patient Protection and Affordable Care Act of 2010, the US Congress established the PCORI as an independent agency with the sole purpose of conducting and funding comparative clinical effectiveness research.15 Direct funding of approximately $3.5 billion was provided for the period from 2010 to 2019 from the US Treasury and from mandated fees levied on health insurers, including Medicare, commercial health plans, and large self-insured employers. PCORI was created as a not-for-profit, nongovernmental agency, overseen by a 21-person multistakeholder Board of Governors. A 17-member Methodology Committee also was created to develop and improve the science and methods of comparative clinical effectiveness research. Both Board and the Methodology Committee members are appointed by the Comptroller General, US Government Accountability Office. PCORI established, through a public process, five broad national priorities (Table 19.2) to guide its research agenda.16 In each priority area, PCORI funds both investigator-initiated projects and specific, usually larger scale, projects that focus on high-priority topics. To identify high-priority topics, PCORI solicits research questions from a range of stakeholders using multiple ongoing strategies. Questions may be submitted via PCORI’s website. PCORI convenes meetings and maintains active relationships with a broad range of organizations representing patients and consumers, caregivers, clinicians and delivery systems, health plans and large employers, and the research community. These organizations are invited to submit lists of CER questions in their areas of interest and to participate as stakeholders on PCORI-funded research. PCORI also supports the convening of researchers with patient, consumer, and clinician communities through engagement

II. STUDY DESIGN AND BIOSTATISTICS

Recent Definitions of Comparative Effectiveness Research Year

Definition

Agency for Healthcare Research and Quality7

2006

Comparative effectiveness research is designed to inform health-care decisions by providing evidence on the effectiveness, benefits, and harms of different treatment options. The evidence is generated from research studies that compare drugs, medical devices, tests, surgeries, or ways to deliver health care.

Congressional Budget Office

2007

As applied in the health-care sector, an analysis of comparative effectiveness is simply a rigorous evaluation of the impact of different options that are available for treating a given medical condition for a particular set of patients. Such a study may compare similar treatments, such as competing drugs, or it may analyze very different approaches, such as surgery and drug therapy. The analysis may focus only on the relative medical benefits and risks of each option, or it may also weigh both the costs and the benefits of those options. In some cases, a given treatment may prove to be more effective clinically or more cost-effective for a broad range of patients, but frequently a key issue is determining which specific types of patients would benefit most from it.

Medicare Payment Advisory Commission (MedPAC)11

2008

Comparative-effectiveness analysis evaluates the relative value of drugs, devices, diagnostic and surgical procedures, diagnostic tests, and medical services. By value, we mean the clinical effectiveness of a service compared with its alternatives. Comparativeeffectiveness information has the potential to promote care of higher value and quality in the public and private sectors.

Institute of Medicine

2009

Comparative effectiveness research is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy-makers to make informed decisions that will improve health care at both the individual and population levels.

Federal Coordinating Committee on Comparative Effectiveness Research12

2009

Comparative effectiveness research is the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat, and monitor health conditions in “real-world” settings. The purpose of this research is to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers, responding to their expressed needs, about which interventions are most effective for which patients under specific circumstances. • To provide this information, comparative effectiveness research must assess a comprehensive array of health-related outcomes for diverse patient populations and subgroups. • Defined interventions compared may include medications, procedures, medical and assistive devices and technologies, diagnostic testing, behavioral change, and delivery system strategies. • This research necessitates the development, expansion, and use of a variety of data sources and methods to assess comparative effectiveness and actively disseminate the results. The definition above is not meant to exclude randomized trials; however, these trials would need comparator arms other than placebo and be representative of populations seen in “real-world” practice.

RAND Corporation13

2009

Comparative effectiveness research examines the degree to which alternative treatments for the same health problem produce equivalent or different health outcomes. The products of comparative effectiveness research can be used in a variety of ways, including to provide information to physicians and patients in choosing appropriate treatments, as well as input into insurance benefit design, coverage determination, and payment.

National Pharmaceutical Council14

2016

Comparative effectiveness research (CER) is the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat and monitor health conditions. The goal of CER is to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers, responding to their expressed needs, about which interventions are most effective for which patients under specific circumstances. CER uses a wide range of research methods, including randomized controlled trials, observational studies, and systematic reviews, a structured assessment of evidence available from multiple primary studies.

19. COMPARATIVE EFFECTIVENESS

II. STUDY DESIGN AND BIOSTATISTICS

Source

272

TABLE 19.1

273

THE ROLE OF COMPARATIVE CLINICAL EFFECTIVENESS RESEARCH IN THE NATION’S MEDICAL RESEARCH ENTERPRISE

TABLE 19.2 1

The National Research Priorities of the PatientCentered Outcomes Research Institute

Assessment of Prevention, Diagnosis, and Treatment OptionsdComparing the effectiveness and safety of alternative prevention, diagnosis, and treatment options to see which ones work best for different people with a particular health problem.

2

Improving Healthcare SystemsdComparing health system elevel approaches to improving access, supporting patient self-care, innovative use of health information technology, coordinating care for complex conditions, and deploying workforce effectively.

3

Communication and Dissemination ResearchdComparing approaches to providing comparative effectiveness research information, empowering people to ask for and use the information, and supporting shared decision-making between patients and their providers.

4

Addressing DisparitiesdIdentifying potential differences in prevention, diagnosis or treatment effectiveness, or preferred clinical outcomes across patient populations and the health care required to achieve best outcomes in each population.

5

Accelerating Patient-Centered Outcomes Research and Methodological ResearchdImproving the nation’s capacity to conduct patient-centered outcomes research, by building data infrastructure, improving analytic methods, and training researchers, patients, and other stakeholders to participate in this research.

awards. Many of these projects develop and submit CER questions. In consultation with PCORI’s Board of Governors and multiple stakeholder groups, PCORI processes topics through a prioritization pathway (Fig. 19.4). Standing multistakeholder advisory panels, one for each of PCORI’s national priorities, are appointed to 2-year terms by PCORI from among volunteers responding to annual calls. Advisory panels meet regularly and provide guidance on topics in the pipeline and new issues that PCORI should consider. Submitted questions and their progress through prioritization can be followed on PCORI’s website. Following Advisory Panel input, PCORI’s Board of Governors may assign questions prioritized highly either for development as specific (i.e., “targeted”) funding announcements or for placement on a list of highpriority questions included in PCORI’s Pragmatic Clinical Studies initiative. Both the targeted announcements and Pragmatic Clinical study awards fund relatively large studies. Awardees are expected to actively involve relevant stakeholder organizations in the research process and to attend to possible differences in relative effectiveness and harms across patient subgroups in their analyses. The entire process, from identification of study topics to the conduct of research, is uniquely stakeholder-driven.

THE ROLE OF COMPARATIVE CLINICAL EFFECTIVENESS RESEARCH IN THE NATION’S MEDICAL RESEARCH ENTERPRISE CER addresses genuine uncertainty at the point of clinical or health system decision-making; it aims to provide the clinical information needed to move forward on an individual, practice, system, and/or policy level. As such, CER is not the first design employed in a new area of inquiry (Fig. 19.2) and only becomes relevant when there are at least two viable options being considered and an important choice to be made with respect to preventing, diagnosing, treating, or managing illness. By contrast, “discovery” or traditional biomedical research covers the initial steps in a new area and is the domain of the NIH in the United States along with the life sciences industry. These activities range from understanding the genetic, molecular, and environmental mechanisms underlying health and illness to the more applied areas of developing new pharmaceuticals, devices, procedures, or diagnostics based on this understanding. Epidemiologic inquiry at this stage can provide insights that lead back to the basic sciences or forward to new approaches for preventing, diagnosing or treating illness. Once new approaches are identified, further knowledge acquisition develops in a relatively orderly, although not always predictable sequence, depending on the type of innovation. For drugs and medical devices, the path to approval involves oversight and review by the US Food and Drug Administration (FDA). For pharmaceuticals, next steps after initial proof-of-concept and safety/dosing studies involve placebo-controlled efficacy studies sponsored largely by industry under FDA guidance and scrutiny. Efficacy studies tend to focus on one or more “primary outcomes” and often leave other

NIH Industry Academia

FDA CMS Discovery

Patients Specialties Payers

Clinical and Healthcare Policy

PATIENT-CENTERED OUTCOMES RESEARCH INSTITUTE

Regulation/ Approval

Comparative Clinical Effectiveness Research (CER)

3

FIGURE 19.2 PCORI’s role in National Clinical Research Enterprise.

II. STUDY DESIGN AND BIOSTATISTICS

274

19. COMPARATIVE EFFECTIVENESS

outcomes unstudied. Sample sizes and duration of follow-up are calculated for the primary outcomes. Studies are planned to reach a conclusion as early as possible. Study populations are typically narrowly selected, often for having a high risk for the primary outcome, but relatively limited potential for harms. Extremes of the age spectrum and individuals with comorbidities often are excluded from the study population. Left unanswered is how well study findings might generalize to the broader range of patients to whom they will be applied in practice, who may be at much lower risk of disease or higher risk of adverse effects. Patients with comorbid illnesses along with the disease being studied also are often excluded, further reducing the generalizability of preapproval trials. Smaller sample size and relatively short follow-up commonly fail to detect rarer adverse effects or those which occur only after longer periods of time. For new medical devices, pathways also typically involve industry sponsorship with FDA oversight and approval, but the pivotal studies are unlikely to involve placebo-controlled trials and often do not involve randomization. New procedures and health system interventions develop even less systematically and without FDA involvement. New procedures may spread into practice with or without case series publications. System-level interventions often incorporate multiple components, some proven effective previously and others unproven, into new efforts such as populationmanagement programs, but full programs are rarely evaluated before being implemented and it is difficult to know what components are critical for the success of the intervention. The final step in reaching practice for most new interventions (pharmaceuticals, devices, procedures, and system-level programs) is the decision on whether health-care payers, both public (Centers for Medicare and Medicaid Services [CMS]) and private (commercial insurers and self-insured employers) will cover the costs for these products. For legislative reasons, CMS currently enjoys less leeway than other payers in deciding on whether to cover new pharmaceuticals. For all other types of interventions, all payers must make these decisions. In situations where there are existing treatments or approaches, CER can be a valuable tool for making informed decisions about new interventions that help to maximize health benefits, and reduce harms as well as waste. Occasionally, findings from preapproval efficacy studies are so strong that they provide a sufficient basis for making evidence-based clinical and coverage decisions. A recent example is the emergence of the new direct-acting antiviral agents for the treatment of hepatitis C (hepatitis C virus (HCV)). In preapproval efficacy studies among patients with moderate to severe

HCV-related liver disease, these agents were so superior to older agents in terms of eradicating the virus, preventing progression of disease, and freedom from serious adverse effects that clinical guidelines changed immediately17 and insurers quickly covered and promoted treatment for this group of hepatitis C patients. In other cases, the comparative effectiveness of new interventions (vs. earlier approaches) is not clear when initial coverage decisions must be made. There may be little or no evidence of how new interventions work, in terms of either effectiveness or safety, in patients who differ from those in the preapproval trials, such as older patients, those with comorbid illness, those with milder illness, or those in different clinical settings than where the approval studies were conducted. There is rarely any direct evidence comparing the new interventions with previously available therapies in these patient groups. Thus, new interventions in a clinical area with established therapies or practices are a prime subject area for CER. These practical questions often require larger, longer CER studies, considering a greater range of outcomes and may require several years to complete. Payers, as well as patients and clinicians, face a period where decisions must be made on minimal evidence. Systematic reviews based on the multiple, usually small, preapproval trials or that indirectly compare findings from such studies to findings from studies of older, available therapies, may provide some assistance. However, the lack of more complete evidence on how the new interventions will perform in broad populations and everyday settings or on how they affect a range of outcomes not included in preapproval efficacy studies inevitably leaves many unknowns at this point. Decision-makers also turn to modeling of the natural history of illness and of the effectiveness and safety of new versus older treatment options to estimate comparative effectiveness and support initial decisions regarding coverage and pricing. Simulation models may help inform patient and clinician decision-making by presenting quantitative information on possible trade-offs. However, the utility of these models depends greatly on the model structure and the appropriateness of its assumptions. A second useful by-product may be a clearer appreciation of the critical “evidence gaps,” the CER needed to reduce uncertainty in preliminary assessments of comparative benefits and harms. In situations where marginal differences in effectiveness (benefits and harms) are either demonstrated or estimated to be small but costs differences are considered to be large, cost-effectiveness, and cost-utility models are sometimes useful.18 These models relate differences in overall effectiveness to differences in costs across comparative options using a common metric, such as cost per quality-adjusted life year gained.

II. STUDY DESIGN AND BIOSTATISTICS

THE METHODS OF COMPARATIVE CLINICAL EFFECTIVENESS RESEARCH

Cost-effectiveness approaches have more recently been adapted and relabeled as “value assessments,” and used to recommend pricing levels linked to the perceived value of a treatment.19 Again, unless these models have accurate and complete CER information, the value estimates will remain tenuous. The interface between those who fund and conduct CER, those who build and report on simulation models, including costeffectiveness and value, and those who must make clinical and policy decisions is a critical locus for meaningful dialog and for generation of applied CER in the interest of patients.

THE METHODS OF COMPARATIVE CLINICAL EFFECTIVENESS RESEARCH The research methods of CER are not fundamentally different from those used in etiologic research, safety research, or efficacy studies. However, the goals of CER and of pragmatic research place novel demands on traditional clinical research methodologies. To support decision-making, at either the individual or population level, more attention must be paid to identifying the right research question, the question that could change decision-making and practice. To obtain study results that can be generalized to broad patient populations, more heterogeneous populations must be studied, with fewer exclusion criteria and greater participation rates than in most clinical research. Studies must be conducted under as near real-world conditions as possible, blending research more seamlessly into clinical care rather than isolating it in specialized research units. Larger sample sizes often are needed in CER. The differences to be detected (or ruled out) between two active treatments are likely to be small but nonetheless important. The level of certainty needed to justify changes in clinical care choices demands a precision that depends in part on larger sample size. The broader, more heterogeneous study populations of CER usually represent a broader range of expected treatment effects; thus CER studies demand sample sizes large enough to examine treatment heterogeneity and allow for detection of possible subgroup differences, meaningful benefits and significant harms. The challenges of making research more pragmatic create a fertile area for methods research and development, in both observational and experimental CER. Recognizing this need, the AHRQ has funded a substantial body of methods research, beginning with work conducted by its Evidence-based Practice Centers and strengthened in its DEcIDE Program.20 This work has created a strong foundation for further CER development in the United States and beyond (see Text Box). AHRQ convened a series of symposia beginning in

275

2006 to build the methods research community and disseminate findings. Proceedings of three symposia have been published21e23 and stand as important collections of methodological work aimed at advancing CER methods. These symposia focused almost exclusively on observational CER methods. More recently, PCORI’s Methodology Committee met its legislative mandate by developing, posting for public comment, revising, and publishing in 2012 an initial set of Methodology Standards24 for conducting patientcentered comparative clinical effectiveness research. PCORI’s Methodology Standards are regularly updated or revised by the Methodology Committee, which also is engaged to develop new standards as needed to support innovations in CER. Standards (Table 19.3) address a broad set of challenges to decision-support research and clarify that experimental as well as observational approaches to primary data collection as well as evidence synthesis are fundamental components of CER methodology. PCORI requires all applicants to adhere to these Methodology Standards in developing and submitting research proposals and in conducting the research. Merit reviewers evaluate submitted research proposals against the Methodology Standards and Peer Review of final research reports assesses adherence to the standards in the conduct of PCORI-funded CER. Training materials and continuing medical education materials have been developed to aid in dissemination of these standards and are available on PCORI’s website.25

Getting the Research Question Right Research that can provide guidance for clinical decision-makers, change practice, reduce practice variation, and increase the quality of health-care decisions demands research questions that are carefully considered and constructed in collaboration with those who will use the study’s findings. Poorly constructed comparisons may be “interesting” and may add to clinical knowledge incrementally, but they are unlikely to support improved decision-making and may confuse choices, even if the studies are well conducted. PCORI’s Methodology Standards (summarized in Table 19.3) include six recommendations intended to aid in specifying research questions. The standards first emphasize a careful and systematic approach to identify true research gaps, including use of systematic reviews and clinical guidelines as sources of important unanswered questions. Both systematic reviews and clinical guidelines development efforts typically conclude with a set of unanswered questions. Especially in the case of clinical guidelines, the answers to these questions would likely be used to refine guidelines and change

II. STUDY DESIGN AND BIOSTATISTICS

276

19. COMPARATIVE EFFECTIVENESS

TABLE 19.3

PCORI Methodology Standards

CROSS-CUTTING STANDARDS FOR PCOR 1

Standards for Formulating Research Questions Six standards that specify what to include in research protocols as a means of increasing study quality as well as transparency in research

2

Standards Associated with Patient-Centeredness Four standards that promote effective patient engagement and the explicit incorporation of patient needs, values, and preferences into research

3

Standards for Data Integrity and Rigorous Analyses Six standards that describe necessary documentation of key decisions and tests of the assumptions made in analyses

4

Standards for Preventing and Handling Missing Data Five standards outlining proper statistical methods for handling missing data

5

Standards for Heterogeneity of Treatment Effects Four standards on how to account for the fact that different people do not always respond the same way to the same treatment

STANDARDS FOR SPECIFIC STUDY DESIGNS AND METHODS 6

Standards for Data Registries Three standards to help ensure that registries contain relevant, highquality data that are used appropriately

7

Standards for Data Networks as Research-Facilitating Structures Two standards to help ensure that key components are included in network design and considered when network data are used in studies

8

Standards for Causal Inference Methods Six standards on accounting for possible sources of bias and addressing them to produce valid conclusions about the causal effect of an intervention

9

Standards for Adaptive and Bayesian Trial Designs Five standards providing guidance on the design and conduct of studies that use such designs

10

Standards for Studies of Diagnostic Tests Five standards that address studying the impact of diagnostic tests on subsequent care and patient outcomes

11

Standards for Systematic Reviews One standard that outlines the application of standards for systematic reviews

practices. The standards emphasize the importance of specifying all aspects of the question necessary for it to be relevant and actionable, generally represented by the PICOTS acronym (Population/patients, Interventions, Comparators, Outcomes, Timing, Setting). We comment further on each of these in the following paragraphs.

Choosing the Study Population The Methodology Standards stress the importance of selecting the appropriate study population given the

question. A study population too narrowly defined (e.g., only very high-risk patients) offers little useful information for the broader, more typical patient populations in whom the treatment will subsequently be considered and promoted. Yet, many such narrowly focused studies are conducted and reported, leaving patients, clinicians, and insurers to wonder how well the study results apply to themselves or the majority of patients. Conversely, a study population can be defined too broadly and fail to answer useful questions for anyone. If it includes substantial numbers of patients in whom the treatment would rarely be considered or who would not be expected to benefit from treatment, it is likely to underestimate or fail to detect true benefits even for large patient subgroups. The Methodology Standards also urge CER investigators to consider key subgroups of interest in advance when defining the study populations and to power studies so that comparative effectiveness may be assessed within these subgroups. A critical role of CER is, in fact, to define the optimal prevention, diagnosis and treatment strategies for individuals within heterogeneous populations, as defined by demographic, clinical, and genetic characteristics. There will nearly always be trade-offs to consider between the breadth of the study population, the utility of findings to various audiences, and the costs or feasibility of the study. These trade-offs will require careful consideration and prioritization among stakeholders and may ultimately mean that multiple studies will be needed to fully address a clinical or policy question.

Selecting Appropriate Interventions and Comparator(s) Selecting the most appropriate comparator(s) in CER studies is among the most difficult choices to be made. CER aims to address gaps in knowledge about the relative benefits and harms of selected interventions compared with other available and legitimate approaches. For clinical treatment with pharmaceuticals, devices and procedures, the decision has usually been made that something needs to be done. Comparing the treatment to doing nothing would not be helpful. Thus, comparisons are almost always with “active” alternatives. Sometimes the need for direct head-to-head comparisons is obvious (e.g., two new drugs intended for the same patient population with potential differences in effectiveness, acceptability, adverse effects or costs; a new agent vs. the current standard of care; or surgery vs. optimal nonsurgical care). But in many more instances, careful thought is required to ensure that the comparator represents a realistic treatment option for the study population. Comparators that will be

II. STUDY DESIGN AND BIOSTATISTICS

THE METHODS OF COMPARATIVE CLINICAL EFFECTIVENESS RESEARCH

outdated by the time study results are available as well as comparators considered to be substandard care are not useful in CER. Alternatives must correspond to genuine choices faced by patients, clinicians, or delivery systems and should not include “straw man” choices with known or suspected imbalances (i.e., the highest effective dose of one agent vs. lower doses of another). For prevention and screening measures, it is more common to compare preventive interventions versus no or placebo interventions, simply because the “higher standard” of proven to be better than no action (first do no harm) must be met first. CER has less commonly been a part of traditional considerations in clinical prevention. However, the United States Preventive Services Task Force (USPSTF) has found some instances, such as colorectal cancer, cervical cancer, or breast cancer screening, where there are now two or more viable screening choices (e.g., fecal occult blood tests vs. endoscopy, HPV testing vs. pap smears) that may be compared. CER also is useful for comparing rescreening intervals and alternative follow-up diagnostic approaches in those who screen positive. System-level interventions are common concerns among those who nominate and prioritize CER questions, since failure to consistently deliver what is already known to be beneficial is widespread in the United States and other countries. However, the relevant interventions to be compared do not align strictly with either treatment-related or preventive services. Systems may need to know whether organizing care (differently), changing the mix of health-care personnel, altering hours of operations or modifying payment schemes will improve the delivery of, outcomes from, or satisfaction with clinical services. Some system-level interventions are composed of multiple components. Single components may have been shown to be effective previously, but the combined intervention may not have been tested. Comparisons of different combinations may then be most useful. In other cases, the optimal direct comparisons are of two “active” interventions (e.g., telecare vs. face-to-face mental health care; email vs. telephone health coaching; multispecialty referral clinics vs. integrated primary care services). However, the most relevant and viable comparator in many instances is continuing business as usual versus implementing a system change. “Usual care” then becomes the most appropriate comparator. Because “usual care” has been shown to vary widely between settings, it is essential to carefully define its components in advance and to monitor the delivery of that care during the study. A CER study that cannot fully describe what happened in both comparator arms fails to answer the “compared-towhat” question and is difficult to interpret, replicate, or act upon.

277

Choosing Clinical Outcomes to Be Measured To support decision-making, CER studies usually include a set of clinical outcomes that is larger and measured at longer, more clinically meaningful timepoints than in efficacy studies. In efficacy studies, simply proving that a new treatment is better than placebo over the short term for a primary outcome may fully meet study goals of obtaining FDA approval. In practice, patients and clinicians often need information for a much broader range of legitimate outcomes, and over longer periods of time. These often include outcomes reported directly by patients such as fatigue, depression, functional capacity, or symptoms specific to the illness. Individual patients vary in the outcomes of most importance for decision-making. In any CER study, it is highly desirable to use standard outcome measures, measures that have been validated and reported on in previous studies, whenever possible so that study results may be compared with those of earlier studies and easily included in research syntheses and metaanalyses. For example, the PatientReported Outcomes Measurement Information System (PROMIS) is a part of a set of publicly available, standardized, general, and symptom-specific item lists developed for use in clinical research.26 Many diseasespecific standard instruments also are emerging. An important frontier in CER and clinical medicine is identifying the full set of meaningful outcomes that should be collected for research on a specific condition.27,28

The Role of Engagement in Specifying Research Questions Specifying all these aspects of CER questions presents trade-offs between desire for rich information, study costs, and burden to participants. Various stakeholders often disagree initially on the optimal research question and some compromise may be needed. A steady focus on the most urgent needs of patients and clinicians and on practical aspects of studying, the research question is important for reaching agreement. Studies designed with key research stakeholders absent during the planning phase are much more likely to face disagreement on the implications, relevance, or utility of findings when they become available. PCORI has stressed the need for “engaging” all relevant stakeholders in every aspect of the research process, beginning with the selection of research questions, but extending to the preparation and review of study proposals, and conduct of the research. PCORI’s Methodology Standards include guidance on engaging stakeholders “as appropriate.” The goals of engagement are to ensure that the study will ask a

II. STUDY DESIGN AND BIOSTATISTICS

278

19. COMPARATIVE EFFECTIVENESS

question relevant to stakeholder needs, be conducted with fidelity to the original question, and be reported in ways that prove useful for these key stakeholders and serve dissemination. Both individuals and organizations add value and relevance to the research process. Individual patients and caregivers bring the “lived experience” of those with the condition, providing qualitative insight into the relative importance of various symptoms, treatment side effects, longer term outcomes, and feasibility and acceptability of proposed treatment approaches. Advocacy organizations for specific conditions bring a broader awareness of the entire spectrum of the condition, from prevention and early detection to palliative and end-of-life care for conditions that are lifethreatening. They also can bring awareness of clinical and public policy issues that raise important CER questions or affect the feasibility of new treatment approaches. Clinician organizations responsible for developing evidence-based clinical care guidelines in a specialty area and payer organizations responsible for making coverage decisions for new treatments are especially well-positioned to recognize important comparative questions. Representatives of the life-sciences industry can raise questions and provide insights about the comparative effectiveness of their products. Other funding agencies, particularly AHRQ, introduced elements of stakeholder engagement before PCORI was created, especially for the prioritization of research questions and dissemination activities.29 The FDA, the NIH, and industry research have all followed PCORI’s lead and increasingly are emphasizing engagement, especially patient engagement, in setting research priorities. However, none but PCORI currently require engagement consistently in all aspects of the research process or provide resources to facilitate the experience. An engagement rubric has been developed by PCORI to assist applicants and other researchers in selecting and implementing suitable engagement strategies and taxonomy of stakeholder engagement has identified the stages of research where engagement may be employed.30,31 Many engagement strategies are described in the rubric and found among PCORI-funded projects. Although the practice of engaging patients and other stakeholders appears sensible, the evidence supporting its benefits and defining its methods is just beginning to appear. It is likely that many strategies may be interchangeable, and best strategies have not been identified. A systematic review covering 66 papers published before 2009 found a positive impact of engagement of patients or the public on all stages of health-related research. Improvement areas included development of user-focused research objectives, user-relevant research questions, user-friendly study information, questionnaires and interview schedules, as well as

more appropriate recruitment strategies, consumerfocused interpretation of data, and enhanced implementation and dissemination of results.32 Most of the studies reviewed were qualitative studies. A more recent systematic review of engagement in CER considered 70 studies published from 2003 to 2012.33 In this review, engagement with patients was much more frequent than with clinicians, payers, or other stakeholders. Engagement also has been reported less frequently for the later stages of the research processdconducting, interpreting, and disseminating researchdthan for earlier stages such as identification and prioritization of research questions.34 It is likely that gaps in the continuity of engagement within projects would decrease chances that the study’s findings would be accepted and implemented by all relevant stakeholders. Further evidence is needed to establish the broad effectiveness of stakeholder engagement, to compare strategies and identify those that are most effective and efficient, and to understand the impact of engaging stakeholders other than patients, particularly clinicians, systems leaders, and payers (i.e., insurers and employers). Evaluation outcomes should include recruitment and retention rates in CER studies and ultimately the dissemination and implementation of study findings.

STUDY DESIGNS FOR CER STUDIES A range of analytic study designs can be used for conducting CER (Fig. 19.3). Limitations related to validity, feasibility, utility, or costs affect each design, and no single approach can adequately address the full range of CER questions faced by patients, clinicians, payers, policy-makers, and industry. However, the possible study designs can be used complementarily to build useful information in a specific research area. Choosing the most appropriate study design for a specific question requires consideration of the question itself, the previously available evidence, each design’s known weaknesses as well as its possible advantages in terms of time, cost, validity, and acceptability to patients. Methods development in CER often seeks to reduce or ameliorate or offset known deficiencies of a particular study design or enhance its benefits. The first and most critical decision in selecting the study design is whether the CER study will require random allocation of the intervention or whether a nonrandomized (observational) study of the question could adequately adjust for the biases of confounding by indication or self-selection.35 Clinicians select treatments for individual patients based on their estimate of the patient’s risks, higher-risk patients getting earlier or more aggressive preventive, diagnostic, or treatment

II. STUDY DESIGN AND BIOSTATISTICS

279

STUDY DESIGNS FOR CER STUDIES

Is a randomized design necessary?

Individual Randomized Controlled Trial

Observational Study Designs Cohort Studies Prospective Retrospective

Cluster Randomized Trial

Case-Control Studies

Stepped Wedge Designs??

Area Variation Studies* Instrumental Variable Analyses*

Chance of Residual Confounding

Chance of Residual Confounding

Experimental Study Designs

* Although confounding at the level of the individual patient is greatly reduced, the possibilities of confounding at the level of the small area or the instrument must be considered.

FIGURE 19.3 Experimental and observational study designs in CER.

strategies while lower-risk patients get less aggressive choices. Patients also are selective. Healthier patients, those with healthy behaviors, or those already concerned about certain risks may each be more likely than others to use preventive measures, undergo more intensive or frequent screening or choose one treatment approach over another. These tendencies are strong, and they bias observational comparisons of outcomes when treatments have assigned by clinicians or chosen by patients (as in most nonrandomized studies). Patients in the comparator groups already differ at baseline in ways that predict their outcomes, even if the interventions are exactly equivalent. Efforts to measure and adjust outcomes analyses for known differences between patient groups are a central part of observational research, but in many instances remaining concerns about patient differences undermine confidence in their findings and the impact of findings on clinical practice. When observed differences in outcomes are “small” (e.g., m0 at the a ¼ 5% significance level, we compare the observed z test statistic value to the critical value 1.645 (1.645 if HA:mx < m0), which is associated with the upper (lower) 5% of the normal distribution, and if the test statistic exceeds this value we reject the null hypothesis. Many statisticians will use a tail probability of 2.5% to define the rejection region, using the associated critical value of 1.960 (or 1.960), even if the test is one-sided to not make one-sided tests “easier.” This approach has the advantage of keeping the hypothesis test consistent with the associated 95% CI for the statistic. Confidence Intervals Another way to evaluate evidence is by using a CI. When a ¼ 0.05, we use a 95% CI. For general a, a 100  (1 e a)% CI for a population parameter is formed around the point estimate of interest. The most basic CI is that for the mean, m. If variance is known, the CI has the following formula:   z1a=2 s z1a=2 s x pffiffiffi ; xþ pffiffiffi . (24.9) n n By contrast, if the variance is unknown, then the sample standard deviation sx is used instead and the t critical value is used instead of the corresponding z value. In most cases, the variance is not known and must be estimated. Hence, the t-statistic is commonly reported for continuous data and the Student t-distribution is used to determine the critical value. For large sample sizes the critical values for the Student t and the normal distribution will be nearly identical, and so sometimes in practice the latter is used even for a t-statistic. There is an important parallelism between hypothesis testing and CI construction. Specifically, if the hypothesized population parameter falls within the CI, we do not reject the null hypothesis. For a 95% CI this is similar to performing a two-sided test at the a ¼ 5% significance level. z Tests or t Tests The choice between the t and z tests can be important. Although some people will switch to the normal z critical value as soon as sample size looks slightly large (e.g., n > 30), doing so can be problematic. If we look at the Z and T values at the 0.975 percentile, the upper end of a 95% CI, at df ¼ 30 the T value is 4% larger than the normal value. At df ¼ 120, there is still a 1% difference between the t distribution and the normal distribution. This may seem silly, and many times the

347

difference between these two distributions and their associated tests will not matter; a test is highly significant or nonsignificant, but in general it is best to use a Student’s t distribution if indeed that is what the test and data warrant. If the variance is unknown, which it almost always is, and we have continuous data, the t-test is recommended. Suppose an investigator wishes to determine whether pediatric anesthesiologists have unusually high serum Erp 58 protein levels. This protein is associated with industrial halothane hepatitis. Suppose that he/she collects n ¼ 9 blood samples, and the sample mean and standard deviation of the protein levels are x ¼ 0.35 and sx ¼ 0.12 (optical density units), respectively. If the mean protein level is over 0.28, it will suggest that further study is needed. He/She chooses a one-sided a ¼ 5% significance level. This hypothesis test corresponds to H0:mx  0.28 versus HA:mx > 0.28. Using formula Eq. (24.8) one can calculate T¼

0:35  0:28 pffiffiffi ¼ 1:75; 0:12 9

(24.10)

which is less than the 95% percentile, t1ea,8 ¼ 1.860, of Student’s t distribution with n e 1 ¼ 8 df (p-value ¼ .06). Thus, he/she does not reject the null hypothesis, although he/she may wish to collect a larger sample to explore this question further. Note, had the normal percentile z0.95 ¼ 1.64 been used, the null would have been rejected. In practice, one would collect a larger sample and use a more advanced method such as multiple regression to adjust this hypothesis test for important covariates such as age, gender, work experience, body mass, and medical history.9

Binary Data Just as we can perform hypothesis tests on continuous data, we can perform them on proportions estimated from binary data. Binary or dichotomous outcomes are common in medical research. Binary or dichotomous data have two possible outcomes, such as success or failure, presence or absence of disease, or survival or death. When each observation is scored as a 1 (success) or a 0 (failure), then the average across of the data is simply the proportion of successes. Typically for binary data, a hypothesis test will be performed for the sample proportion(s). A quick note on notation: Eq. (24.4) and most statistical textbooks use Greek letters for population parameters, such as p for the true population proportion. Several former course participants have mentioned the mathematical constant p (w3.14) is what jumps to mind when seeing p in this section. With apologies to statisticians, in an attempt to help these readers the

II. STUDY DESIGN AND BIOSTATISTICS

348

24. HYPOTHESIS TESTING

rest of this chapter uses p for population proportions and b p for sample estimates of a proportion. Developing a Test There are a variety of different tests that can be used with binary data, including the z test, continuity corrections to the z test, and exact tests. Let p1 denote a population proportion, let b p 1 denote a sample estimate of that proportion, and let p0 denote that proportion’s value under the null hypothesis. To test the two-sided hypothesis H0: p1 ¼ p0 versus HA: p1 s p0 (or a corresponding onesided hypothesis), we can consult the reference distribution for a sample size n (the binomial) and write down the test statistic following the formula in Eq. (24.1), b p 1  p0 ffi Z ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p0 ð1 p0 Þ=n

(24.11)

which statistical theory (the central limit theorem) tells us has approximately the standard normal distribution for large enough sample sizes (n > 25, np0 > 5, and n(1 e p0) > 5). If this test statistic falls in the extreme percentiles of the standard normal distribution (or beyond the appropriate lower or upper percentiles for a one-sided test) we can reject the null hypothesis. For small to modest sample sizes, the normal distribution will only give an approximate p-value for the test statistic in Eq. (24.11) due to the discreteness of the data. We can improve the approximation by adding a small sample continuity correction or by performing an exact test. See Altman5 for details. Exact Tests If we have sufficient computing power we can perform an exact binomial test rather than using the normal distribution to approximate the sampling distribution for the binary test statistic and the resulting pvalue. In an exact binomial test, we enumerate the true binomial probabilities for each of the possible numbers of successes or events (0, 1, 2,., n) and then reject the null hypothesis when the sum of the probabilities for values as extreme or more extreme than the observed value is less than the significance level. For example, suppose that the null hypothesis is H0: p ¼ 0.35 and n ¼ 6; under this hypothesis, the true binomial probabilities of observing exactly 0, 1, 2, 3, 4, 5, and 6 events out of six trials are 0.075, 0.244, 0.328, 0.235, 0.095, 0.020, and 0.002, respectively. Thus, if the alternative hypothesis is HA:p < 0.35 and we observe five events, the one-sided pvalue is 0.020 þ 0.002 ¼ 0.022. There are a few ways to define the two-sided p-value even just using exact probabilities. Perhaps the simplest and most common method is to calculate the one-sided p-value for the tail in the direction of the observed data and double it. Thus, if the alternative hypothesis is HA: p s 0.35, the

two-sided p-value is 0.044. In both of these examples, we would reject the null hypothesis at the 5% significance level but not at the 1% significance level. Confidence Intervals Similar to constructing the test statistic, binomial CI construction can make use of a normal approximation but improvements can be made. The normal approximation methods for binomial CI construction tend to produce CIs that are too small on average and thus have lower coverage rates than the specified confidence levels. In other words, even though we calculate something called a 95% CI, in truth less than 95% of the time that interval contains the true value of interest. For binomial data we need a better interval method. One classical approach for obtaining better binomial CIs is the ClopperePearson method,10 which uses exact binomial probabilities to give CIs which are appropriate for all sample sizes. The ClopperePearson CIs consist of all proportion parameters that are consistent with the observed binomial data at a particular significance level using the exact binomial test with a two-sided hypothesis. Most statistical software can easily provide the ClopperePearson exact confidence bounds for proportions. Several other methods also exist and which one is used can make a difference. For example, with n ¼ 60 trials and x ¼ 15 successes or events, the (1) Wald or normal approximation, (2) ClopperePearson, (3) AgrestieCoull,11 and (4) SAIFS12 methods give 95% CIs of (1) (0.140, 0.360), (2) (0.147, 0.379), (3) (0.157, 0.373), and (4) (0.137, 0.374), respectively. These CIs are all close, but if we are near one of the boundaries the method used may really matter. A statistician can help implement these improved methods for binomial CI construction, which essentially build on the conceptual framework presented in this chapter.

Example Suppose that in response to complaints about allergies, a large hospital changes the standard brand of rubber gloves that it supplies to a new but more expensive brand. An administrator wishes to know what proportion of nurses in that hospital prefers the new gloves, p1, and if that proportion is at least p0 ¼ 40%, he/she will consider the change worthwhile. He/She chooses a onesided significance level of a ¼ 5%. This hypothesis test corresponds to H0:p1  0.4 versus HA:p1 > 0.4. He/She finds that out of a sample of 30 nurses, 18 prefer the new brand, and the rest are indifferent. Hence n ¼ 30, b p 1 ¼ 18/30 ¼ 0.6, and using the previous formula, 0:6  0:4 Z ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2:24; ð0:6 0:4Þ=30

II. STUDY DESIGN AND BIOSTATISTICS

(24.12)

TWO-SAMPLE HYPOTHESIS TESTS WITH APPLICATIONS TO CLINICAL RESEARCH

349

which exceeds the 95% percentile, Z1ea ¼ 1.645, of the standard normal distribution (p-value ¼ 0.01). By comparison, using an exact binomial test the p-value is 0.02. Thus, he/she rejects the null hypothesis and decides to adopt the new brand of gloves. Since this is a one-sided test, using a one-sided 0.025 level test and comparing the exact binomial p-value to 0.025 (not 0.05) or the result of Eq. 24.12 to Z1ea ¼ 1.96 (as opposed to Z1ea ¼ 1.645) would have been preferred to stay consistent with using a two-sided 95% CI. Indeed, examining two-sided 95% CIs he/she finds similar results using the normal approximation or Wald (0.425, 0.775), ClopperePearson (0.406, 0.773), AgrestieCoull (0.423, 0.754), and SAIFS (0.404, 0.787) methods. Although these improved CI methods may seem to create a little extra work, they can be crucial when the binomial test statistic lies near the boundary of significance.

We need to make modeling assumptions to set up a hypothesis test. This will allow us to develop a sampling distribution for the test statistic under the null hypothesis that the mean difference is 0 (i.e., there is no effect of beta-interferon). We assume that the differences for each patient are independent and normally distributed from a population with mean md and variance s2, where d is the mean of the differences on all n subjects. When s2 is known, the test statistic

TWO-SAMPLE HYPOTHESIS TESTS WITH APPLICATIONS TO CLINICAL RESEARCH

This test statistic has Student’s t distribution with n  1 df under the null hypothesis. Before we begin the study, we choose a significance level (i.e., what type I error amount or cut off will we use at the end of the study to determine if the amount of evidence we need to reject the null hypothesis is or is not present in the data). If the test statistic’s value is in the lower or upper a/2  100 percentiles of the reference distribution we reject the null hypothesis and conclude that the means in the two groups (in this case the pretreatment and treatment groups) are not equal. If the test statistic is not in the extreme tails of the distribution we conclude that we fail to reject the null hypothesis, and hence that there is insufficient evidence to conclude that the means in the two groups are different. The p-value is the probability of observing a test statistic value larger (in magnitude or absolute value) than what one observed. Suppose the observed value is Tobs, and let T denote a random Student t variable. Then the p-value is P(t < eTobs) þ P(t > Tobs) for a twosided test. The p-value for a one-sided test with alternative hypothesis HA:md > 0 is P(t > Tobs). Tests based on Z and T test statistics values are called paired z-tests and paired t-tests, respectively. Similar to what is discussed in the z Tests or t Tests section, a paired z test is used when s2 is known, a paired t-test is used when s2 needs to be estimated from the data, and we choose the t-test when wondering which of the two to use.

The goal of many clinical studies is a comparative one, such as comparing the treatment response between two groups. Here, we develop hypothesis tests for comparing the means of two normal populations in both paired and unpaired analyses. We also discuss hypothesis tests for comparing two population proportions. These tests then will be used to analyze the data from the motivating examples in the next section.

Tests for Comparing the Means of Two Normal Populations Paired Data We first consider the hypothesis test appropriate for a pair of outcomes. This analysis corresponds to the betainterferon/MRI trial in which measurements on each patient are observed both before and during treatment. In this situation, we have two observations on every patient from which we can compute the difference di ¼ xi e yi. The data consist of n differences: d1, d2,., dn where n is the number of subjects in the study. In the beta-interferon/MRI study, n ¼ 14. The observations xi and yi correspond to suitably transformed individual mean monthly lesion counts during the baseline period and during the active treatment period for the ith subject. As discussed in general in Hypotheses for the Beta-Interferon/Magnetic Resonance Imaging Study section, the hypothesis we will be testing is H0 : md ¼ 0 vs. HA : md s0:

(24.13)



d pffiffiffi s= n

(24.14)

has the standard normal distribution under the null hypothesis. When s2 is unknown (as is common in most situations in medical statistics), we need to estimate the variance s2 from the data. When s2 is unknown, the test statistic is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n  u 1 X 2 d T ¼ pffiffiffi; where s ¼ t di  d . (24.15) n  1 i¼1 s= n

Unpaired Data We next consider tests of two normal population means for unpaired data. We discuss the cases of equal

II. STUDY DESIGN AND BIOSTATISTICS

350

24. HYPOTHESIS TESTING

variances and different variances separately. We begin with a discussion of the equal variance case. The example that corresponds to this test is the felbamate monotherapy trial, and it is similar to many other parallel groups designs. We assume that we have observations from two groups of subjects, with sample sizes n and m. We assume that the observations x1, x2, x3,., xn and y1, y2, y3,., ym come from two independent normal distributions with a common variance s2 and population means m1 and m2, respectively. The hypothesis test for this situation is H0 : m1 ¼ m2 vs. HA : m1 sm2 :

(24.16)

We calculate the difference in the two sample means and follow Eq. (24.1) again to write down the test statistic using the sampling distribution for the difference of two independent sample means. When s is known, the test statistic of interest is xy Z ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s 1=n þ 1=m

(24.17)

which has the standard normal distribution. When s2 needs to be estimated from the data we calculate the pooled variance estimator to put in to the test statistic, xy T ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi; where s 1=n þ 1=m vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m  n X uP 2 u ðx1  xÞ2 þ y1  y u ti¼1 i¼1 . s¼ nþm2

(24.18)

which has the Student’s t distribution with n þ m e 2 df under the null hypothesis. The preceding estimate of s is the pooled sample standard deviation and is based on the assumption of equal variances in the two groups. As in the previous hypothesis test, if the Z and T test statistics values are in the lower or upper a/2  100 percentiles of this reference distribution, we reject the null hypothesis. These tests based on the Z and T test statistics values are called two-sample z-tests and two-sample t-tests, respectively. Two-sample z-tests are used when s2 is known, and two-sample t-tests are used when s2 needs to be estimated from the data. In many situations, the assumption of equal variance in the two treatment groups is not a good assumption. Since treatments may be effective in only a fraction of subjects, often the variability of the outcome in the treatment group is larger than that of the placebo group. The test statistic to use in this situation is xy Z ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . ffi;  s2x n þ s2y m

(24.19)

where both the sample sizes in each group are large or when the variances are known. The Z test statistic has the standard normal distribution when the null hypothesis is true. When the variance estimates are unknown and need to be estimated using the data, the test statistic is xy T ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   . s2x n þ s2y m

(24.20)

Under the null hypothesis that the means in the two groups are equal, the preceding test statistic has a distribution that is approximately the Student’s t distribution with u df (determined by Satterthwaite’s formula), where    2 s2x nþ s2y m degrees of freedom ¼ u ¼     2  2 (24.21) 2 sy m s2x n þ n1 m1 When this result is not an integer, u should be conservatively rounded downward. Luckily, there is software to compute these quantities for us. Generally the applied statistician today only needs to choose the right test and let the computer do the rest of the work. As with the other hypothesis tests we discussed, if the test statistics values are in the lower or upper a/2  100 percentiles of the reference distribution, we reject the null hypothesis and conclude that the means in the two groups are unequal. The t-test with unequal variances often is called Welch’s t-test. Because this test will be valid for both the case of unequal or equal variances, it is the commonly preferred test for the two-sample setting.

Tests for Comparing Two Population Proportions In the ISIS-4 study, the primary outcome was a binary variable signifying whether a randomized patient was alive or dead at 35 days after randomization. Our interest focuses on comparing participants who were randomized to receive magnesium and those randomized to not receive magnesium. The typical data structure for this two-sample problem involves the number of positive responses in n subjects from group 1 and the number of positive responses in m subjects from group 2. The null hypothesis is that the two groups (magnesium, no magnesium) have the same 35-day survival probability. The hypothesis for this test is H0 : p1 ¼ p2 vs. HA : p1 sp2 :

(24.22)

The assumptions for the test are that (1) the data are binary, (2) observations are independent, and (3) there is a common probability of a “yes” response for each

II. STUDY DESIGN AND BIOSTATISTICS

351

HYPOTHESIS TESTS FOR THE MOTIVATING EXAMPLES

of the two groups. For large sample sizes (typically considered n and m both greater than 25, and np1, n(1  p1), mp2, and m(1  p2) each greater than 5), we usually can approximate the discrete sampling distribution with the normal distribution and use a two-sample z-test for comparing the two population proportions. The test statistic is b p1  b p Z ¼ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2  ffi u ub b p 2 1 b p1 p2 t p 1 1 b þ n m

(24.23)

which, for large sample sizes, has approximately the standard normal distribution under the null hypothesis that the population proportions are equal in the two groups. Sometimes a pooled variance is used. Other tests have been developed for small samples. For example, the Fisher’s exact test enumerates all possible values of the test statistic under the null hypothesis and can be a valid test for values of the proportions and sample sizes, no matter how small.13

HYPOTHESIS TESTS FOR THE MOTIVATING EXAMPLES We now conduct hypothesis tests to analyze the data from the three motivating examples.

Hypothesis Tests for the Beta-Interferon/ Magnetic Resonance Imaging Study The beta-interferon/MRI study consisted of 14 patients followed for 13 months, 7 months on baseline and 6 months on treatment. The outcome was the average number of monthly contrast-enhanced lesions on treatment minus the corresponding average number during baseline. Table 24.1 summarizes the data from the trial. A total of 13 of 14 patients had decreased lesion frequency on treatment compared with their baseline frequency. This result suggests that beta-interferon lowers disease activity in early RRMS. The inferential question is this: do the data provide enough evidence to make a statement about the population of all RRMS patients? A hypothesis test is used to address this question. We conducted a two-tailed test of whether there is a difference between lesion frequency during baseline and lesion frequency after treatment. We chose a significance level of 0.05 before the study began. First, note that the structure of the data suggests that a paired t-test is appropriate. Data are paired since observations on different patients are independent. The variance of the difference in lesion activity for each subject is unknown. In addition,

TABLE 24.1

Beta-Interferon and Magnetic Resonance Imaging Study

Patient Number

Baseline (Mean Lesions/Month)

6-Month Treatment (Mean Lesions/Month)

1

2.43

0

2

1.71

0.67

3

3.14

1.00

4

1.29

0.33

5

0.57

1.67

6

2.00

0

7

6.00

0.33

8

0.43

0

9

12.86

0.17

10

6.42

0.67

11

0.57

0

12

0.71

0

13

1.57

0.17

14

3.17

1.67

the data transformed to the log scale appeared to be approximately normally distributed. The data were transformed so that di ¼ log[(7-month baseline mean) þ 0.5] e log[(6-month treatment mean) þ 0.5]. The constant 0.5 was added to all numbers, a common practice, since the log of 0 is undefined. We use a paired t-test with a test statistic computed as T¼

d pffiffiffi ¼ 4:8: s= n

(24.24)

The test statistic has a t distribution with 14 e 1 ¼ 13 df when the null hypothesis is true. The a/2  100 (2.5%) lower and upper percentiles of the reference distribution are 2.16 and 2.16, respectively. Since 4.8 is less than 2.16, we reject H0 and conclude that there is a difference between lesion frequency during baseline and lesion frequency on beta-interferon. The p-value for the two-sided test can be computed as P(t13 < 4.8) þ P(t13 > 4.8) ¼ 0.0004, where T13 denotes a random variable with the t distribution on 13 df. This means that if the null hypothesis of no effect was true, there would only be a 1 in 2500 chance of observing a test statistic as large (in absolute value) as the one we observed. The test used was a two-sided test for several different reasons. We care if there is a decrease or an increase in lesions. Investigators should be cautious about using one-sided tests, which are only appropriate when there is interest in detecting a beneficial effect from treatment and there would be no interest in detecting a

II. STUDY DESIGN AND BIOSTATISTICS

352

24. HYPOTHESIS TESTING

harmful effect (or the opposite). This is very rare. Onesided tests done at the 0.05 level also would be awkward when the one-sided test was significant, but the twosided 0.05 test was not significant. This is because the commonly reported two-sided 95% CI is consistent with a two-sided test, thus a nonsignificant two-sided test means the CI would contain values consistent with no difference. This would be in contradiction to the one-sided test at a 5% level that declared the two groups significantly different. Always performing tests, one sided or two-sided, with tail probabilities of a/2 avoids this problem. When debates arise about the use of a oneor two-sided test, typically the use of a/2 two-sided tests wins.

Hypothesis Tests for the Felbamate Monotherapy Trial The felbamate monotherapy trial was designed as a parallel group design with 19 patients randomized to the felbamate arm and 21 patients randomized to the placebo arm. Seizure frequency was monitored during the 2-week follow-up period in the hospital or until a patient dropped out of the study. The outcome was daily seizure rates over follow-up period. The test was a two-tailed test of whether there is a difference in seizure frequency between the felbamate and placebo arms. We chose a significance level of 0.05 before the study began. The hypothesis is H0 : mtreatment ¼ mplacebo vs: HA : mtreatment smplacebo : (24.25) The appropriate test is an unpaired t test. The data are independent and approximately normally distributed on the square root scale (by taking square roots of the mean daily seizure counts on all patients). On the square root scale, the mean seizure rates are x ¼ 1.42 in the placebo group and y ¼ 0.42 in the treatment group. The sample standard deviations were sx ¼ 1.3 and sy ¼ 1.0, suggesting that there are higher amounts of variation in the placebo arm. We begin by performing a test under an assumption of equal variances in the two groups. This test is not commonly used in practice, but it does appear in some software packages and is the basis for a commonly used sample size formula discussed in the next chapter. Using formulas Eqs. (24.15) and (24.18), we find the common variance s ¼ 1.17. The test statistic assuming that both populations have a common variance is xy T ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2:71 s ð1=nÞ þ ð1=mÞ

(24.26)

When the null hypothesis is true, the test statistic has a t distribution with n þ m e 2 ¼ 38 df. The a/2  100

(2.5%) lower and upper percentiles of the t distribution with 38 df are 2.02 and 2.02, respectively. Because 2.71 is greater than 2.02, we reject the null hypothesis and conclude that there is a difference in seizure frequency in the placebo and felbamate arms. The p-value which equals P(t38 > 2.71) þ P(t38 < 2.71) ¼ 0.01, means that the chance is approximately 1 in 100 of getting a test-statistic this large (either positive or negative) if the null hypothesis is true. Thus, we can reasonably reject the null hypothesis at this significance level. By comparison a Welch’s t-test, which does not assume an equal variance for the two populations, was conducted. The test was done on the square root scale and resulted in T ¼ 2.74, df ¼ 37.09 rounded down to 37, and a p-value of 0.009, which is similar to the result from the test assuming a common population variance.

Hypothesis Tests for the ISIS-4 Trial: Comparing the Magnesium and No Magnesium Arms The ISIS-4 study was a factorial design of three treatments. We focus on comparing participants receiving magnesium to those not receiving magnesium. A total of 58,050 MI patients were randomized: 29,011 received magnesium and 29,039 did not receive magnesium. The inferential question was whether the proportion of participants dying during the first 35 days after an MI differed between the two groups. The hypothesis is H0 : pMgþ ¼ pMg vs. HA : pMgþ spMg :

(24.27)

The test was two-sided and conducted at the 0.05 significance level. We assume that individual binary outcomes are independent with a common probability of dying in each group, and we note that the sample sizes are large, so we can test this hypothesis with a twosample z-test. The data from the study are presented Table 24.2. The proportion dead at 35 days after randomization (35-day mortality) can be estimated as b p Mgþ ¼ 2216/ 29,011 ¼ 0.0764 and b p Mg ¼ 2103/29,039 ¼ 0.0724. The mortality rate is slightly larger in the magnesium arm. We can formulate the hypothesis test with the test statistic, b p Mg  b p Mgþ Z ¼ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi    ffi u ub b p Mgþ 1 b p Mg p Mgþ t p Mg 1 b þ n m

(24.28)

The test statistic, at least approximately, has the standard normal distribution when the null hypothesis is true. The 2.5% lower and upper percentiles of the

II. STUDY DESIGN AND BIOSTATISTICS

MISSTATEMENTS AND MISCONCEPTIONS

TABLE 24.2

ISIS-4 Trial 2  2 Table MgD

Mge

Dead

2,216

2,103

Alive

26,795

26,936

Total

29,011

29,039

normal distribution are 1.960 and 1.960, respectively. Using the data provided we find Z ¼ 1.82, and since 1.82 falls between 1.960 and 1.960, we do not reject the null hypothesis and we do not have enough evidence to conclude that the population proportions are unequal. The p-value is P(Z < 1.82) þ P(Z > 1.82) ¼ 0.07.

COMMON MISTAKES IN HYPOTHESIS TESTING As the previous section shows, hypothesis testing requires one to make appropriate assumptions about the structure and distribution of a data set, especially the relationships between the sample observations. There are a number of mistakes that researchers commonly make in hypothesis testing due to ignoring the structure of the sample data or failing to check the assumptions of the hypothesis test. These types of mistakes can lead to faulty conclusions. Some of these common mistakes, illustrated in the context of the t-test, follow. A mistake often committed by somewhat inexperienced researchers is to ignore the pairing between observations within subjects. Although the unpaired t-test remains valid (correct type I error rate), there could be a substantial loss of power or efficiency, making it more difficult to identify true effects. Testing paired continuous data with a two-sample unpaired t-test is a common mistake and we pay for it, but the opposite of this mistake is far worse. Incorrectly assuming a paired structure between two independent samples is worse than ignoring pairing that truly exists. Testing unpaired continuous data with a paired t-test is more serious and could lead to the wrong inference. Along these lines, too frequently researchers ignore the dependence that occurs when multiple observations are made on each subject. For example, if there are five subjects and 3, 2, 1, 2, and 2 measurements are made on these subjects, respectively, there are not 10 independent observations. In this case, more complicated methods, such as mixed models regression, must be used to analyze the data. This mistake is both very common and serious because observations on the same subject tend to be more similar (positively correlated) than those on different subjects. Use of a simple test that

353

ignores the dependence will tend to give p-values that are too small compared to the correct values, which in turn will lead one to conclude that the data provide more evidence against the null hypothesis than they actually do. This topic will be further discussed in Chapter 27. At times our analyses ignore the apparent sample distribution of observations, especially features such as skewness, outliers or extreme values, and lower or upper limits on measurement accuracy. Performing a t-test on highly skewed data without appropriate adjustments can lead to the wrong conclusions; although, the t-test is generally robust against mistakes such as ignoring moderate amounts of skewness for sufficiently large samples. Lastly, when we look at the sample distribution we should not forget the variance. Frequently when analyzing data we make a simple mistake and assume equal variances in two groups and perform pooled t-test without knowing from external sources that the variance is the same in the two groups or examining the data either graphically or numerically. We should instead at least perform a Welch’s t-test for two samples. In most cases this is not too serious a mistake. Indeed, the felbamate monotherapy example showed that the two-sample t-test is robust to ignoring the differences between the variances of the two samples.

MISSTATEMENTS AND MISCONCEPTIONS The following are some of the major misstatements and misconceptions that arise when performing hypothesis tests and reporting the results. 1. “Failing to reject the null hypothesis (H0) means that it is true.” On the contrary, failing to reject the null hypothesis may merely indicate that there is not enough evidence to state that it is false at a particular significance level. Failing to reject the null is not equivalent to accepting the null hypothesis. The null hypothesis may be true or it may be false, but we do not have the evidence in the sample being studied to reject it. 2. “The p-value is small so the two sample means (x and y) are significantly different from each other.” This approach is incorrect because the p-value is a statistical tool for making inferences about the true population means. The goal of hypothesis testing is to make statements about population parameters and not the samples themselves. People misstate this frequently, sometimes by accident and sometimes not. 3. “The impact is huge, just look at the tiny p-value!” Focusing on the statistical significance of an effect (its

II. STUDY DESIGN AND BIOSTATISTICS

354

24. HYPOTHESIS TESTING

p-value) but ignoring the effect’s magnitude or size is a common misstatement. In a study with multiple explanatory variables, there often will be several variables that appear to be related to the outcome of interest. While a small p-value may demonstrate significant evidence that the effect of the variable on the outcome is nonzero, the point estimate and CIs for the magnitude of the effect demonstrate how much of an impact that variable has on the magnitude of the response. 4. One of the most commonly discussed misconceptions is confusing statistical significance with clinical significance. This relates to misstatement 3. In the ISIS-4 trial, the participants who received intravenous magnesium sulfide had a 35-day unadjusted mortality rate of 7.64%, whereas those who did not receive that treatment had a corresponding mortality rate of 7.24%. If the two-sided p-value had been equal to 0.007 (it was actually p ¼ 0.07), we would need to ask ourselves this: even though the p-value was quite significant at 0.007, was the increase in mortality of 0.40% on the treatment clinically troubling? Possibly, it is troubling if 0.4% is equal to many lives per year. Possibly, such a small difference is not troubling in some studies or is weighed with other issues such as side effects. Just because a finding is statistically significant does not make it clinically significant. Both types of significance are needed.

SPECIAL CONSIDERATIONS The following topics extend the concepts presented in the previous sections. Most of this chapter was devoted to establishing a conceptual framework for statistical hypothesis testing. We focused primarily on tests for comparing two populations because these tests are the most common types of tests used in clinical research. Here, we briefly describe other methodology that is commonly used in analyzing the data from medical studies. More details on all these subjects can be found in the references.

Comparing More Than Two Groups: One-Way Analysis of Variance The analysis of variance (ANOVA) framework extends the methodology for comparing the means of two populations to more than two populations. This method may be applicable in multiarm clinical trials in which interest focuses on detecting any differences among the various treatments. The hypotheses for comparing k population means with ANOVA can be written as

H0 : m1 ¼ m2 ¼ / ¼ mk vs. HA : Some mi smj

(24.29)

Here, the null hypothesis is that the means in each of the k groups are equal. The assumptions for this test are that the data are normally distributed with a constant population variance across the k groups. In addition, it is assumed that the data for each of the subjects are statistically independent. The test statistic used is the ratio of the between-subject variance to the within-subject variance. Under the null hypothesis of equal population means, the test statistic has an F distribution, and one can obtain a p-value to assess the significance of this test (see Altman5 for more details).

Simple and Multiple Linear Regression Simple linear regression is a technique used to examine the strength of a linear relationship in a set of bivariate or paired data, where one variable acts as the predictor and the other as the response. For example, one may be interested in examining whether there is a linear increase in blood pressure with age for a certain range of ages. The simple linear regression model for blood pressure (y) as a function of age (x) is yi ¼ b0 þ bi xi þ εi ;

(24.30)

where b0 and b1 are the intercept and slope for the regression line, respectively, and the index i denotes the value for the ith individual. In addition, εi is an error term (assumed to be normally distributed with mean ¼ 0 and variance ¼ s2) that characterizes the scatter around the regression line. The intercept (b0) and slope (b1) parameters are estimated using least squares fitting. Least squares fitting involves choosing the line that minimizes the sum of the squared vertical differences between the responses and the points predicted by the fitted line at values of the predictor variable. Hypothesis testing also plays an important role in regression. We often wish to test whether there is a significant change in one variable with each unit increase in a second variable, not only with the data we observed in the sample but also in the population from which the sample data were drawn. In other words, we wish to test whether there is a linear trend present in the data. The null hypothesis would be that there is no linear trend. The hypotheses for linear regression can be stated as H0 : b1 ¼ 0 vs. HA : b1 s0:

(24.31)

The assumptions for this test are that response observations are independent and normally distributed (with constant variance) around the regression line. The test statistic for a significant linear relationship is the ratio of the variance of the data points around the average y value (y) relative to the variance around the regression

II. STUDY DESIGN AND BIOSTATISTICS

SPECIAL CONSIDERATIONS

line. A large test statistic of this type reflects either a steep slope or small variability around a slope. This test statistic has an F distribution under the null hypothesis that the slope is zero (i.e., a horizontal line), and one can obtain a p-value to assess the significance of this test. Multiple or multivariate regression is an extension of simple linear regression, which allows for more than one variable or covariate. We may be interested in examining for a linear increase in blood pressure with age (xi) after adjusting for weight (zi). The multiple regression model can be written as yi ¼ b0 þ b1 xi þ b2 zi þ εi :

(24.32)

The hypotheses, one for each b, for multiple regression are formulated in a similar way as for simple linear regression. For multiple linear regression, each variable that has a slope (b) coefficient found to be significantly different from 0 is commonly interpreted as a variable that has an independent effect on y.

Multiple Comparisons When making many statistical comparisons, i.e., performing multiple hypothesis tests, a certain fraction of the test statistics will be statistically significant even when the null hypothesis is true. In general, when a series of tests is performed at the a significance level, approximately a  100% of tests will be significant at the a level even when the null hypothesis for each test is true. For example, even if the null hypotheses are true for all tests, when conducting many independent hypothesis tests at the 0.05 significance level, on average (in the long term) 5 of 100 tests will be significant by chance alone. Issues of multiple comparisons arise in various situations, such as in clinical trials with multiple end points and multiple looks at the data. By doing multiple tests, you naturally increase your chances of making a type I error if no adjustment is made to the usual testing framework for a single test statistic. Pairwise comparison among the sample means of several groups is also an area in which issues of multiple comparisons may be of concern. For k groups, there are k(k e 1)/2 pairwise comparisons, and just by chance some may reach significance. Our last example is with multiple regression analysis in which many candidate predictor variables are tested and entered into the model. Some of these variables may result in a significant result just by chance. With an ongoing study and many interim analyses or inspections of the data, with no adjustment for performing multiple comparisons, we have a high chance of rejecting the null hypothesis at some time point even when the null hypothesis is true. There are various approaches to the multiple comparisons problem. First, consider if multiple comparisons is actually a problem. If we ask multiple questions we

355

expect multiple answers. If we ask related questions we expect related answers. Looking at the totality of the evidence when interpreting results is far more useful than overzealous correction for multiple comparisons (or ignoring all but the single significant p-value out of 50). One rather informal approach to multiple comparisons is to choose a significance level a lower than the traditional 0.05 level (e.g., 0.01) to prevent many falsepositive conclusions or to “control the false discovery rate.” The number of comparisons should be made explicit in the article. More formal approaches to control the “experimentewise” type I error using corrections for multiple comparisons have been proposed. An example is the Bonferroni correction, in which the type I error rate is taken as a/n, where n is the number of comparisons made. Another class of methods has been developed to correct for multiple comparisons that result from monitoring trial results during the trial. Interim monitoring methods that control the type I error rate are available for various study designs and are discussed further in Chapter 27.14 The classic reference by Hochberg and Tamhane provides a broader discussion of methodology to adjust for multiple comparisons.15 It is best to address the issue of multiple comparisons during the design stage of a study. One should determine how many comparisons will be made and then explicitly state these comparisons. Studies should generally be designed to minimize the number of statistical tests at the end of the study. Ad hoc solutions to the multiple comparisons problem may be done for exploratory or epidemiologic studies. Multiple comparison adjustments should be made for the primary analyses of definitive studies (such as phase III confirmatory studies) to rigorously maintain the type I error rate, i.e., the probability of falsely rejecting any null among those tested, at the chosen a level. Studies that focus on a single primary outcome and data analyzed at the end of study avoid the issue of multiple comparisons. The topic of multiple comparisons is expanded in Chapter 27.

Nonparametric Versus Parametric Tests Inferential methods that rely on assumptions about the underlying distributions from which the data originate are called parametric methods, whereas those that make no distributional assumptions are called nonparametric methods. Nonparametric methods often are used when data do not meet the distributional assumptions of parametric methods, such as asymmetric distributions or unusual numbers of extreme values. Nonparametric methods are usually based on the ranks of observations as opposed to their actual values, which lessens the impact of skewness and extreme outliers in the raw data. Hypotheses are usually stated in terms of distributions instead

II. STUDY DESIGN AND BIOSTATISTICS

356

24. HYPOTHESIS TESTING

of means. Corresponding to the two-sample hypothesis tests of means discussed in this chapter are the following parametric:nonparametric pairs: • Paired t-test: Wilcoxon signed rank test or the sign test • Two-sample t-test: Wilcoxon rank sum test • Analysis of variance: KruskaleWallis test • One proportion z-test: Exact binomial test

2.

In general, nonparametric tests have somewhat lower power than their parametric counterparts, when the parametric method is making the right assumption for the underlying distribution for the data. This is the price one pays for making fewer assumptions about the data. Fewer assumptions, however, does not necessarily mean no distributional assumptions. For large sample sizes, parametric and nonparametric tests generally lead to the same inferences. More information about nonparametric approaches can be found in van Belle et al.16 3.

CONCLUSION Study design is part science and part art. Statistical hypotheses need to match the question of interest and many choices are available. Guidance for clinical trials protocols such as Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) highlight the need for care in this area.17 It is wise to consult with a statistician at the earliest stages of planning a study to obtain help with study design and hypothesis generation. Many times scientific hypotheses can be finetuned as one finds a compromise between the most interesting question to answer and the question that will be practical to address given the constraints of available resources. The chosen study design will limit which statistical hypotheses can be tested and which test statistics are appropriate for the data.18 Both the test statistic and the level of significance should be specified in advance. Once a test statistic is chosen, its sampling distribution can be used to calculate the p-value for the observed data. The p-value summarizes the evidence in the data against the null hypothesis. While statistical software is widely available for data analysis and calculations of p-values, the researcher is required to understand which test statistics are appropriate for the hypothesis of interest and the study design that generated the data.19 Timely collaboration with an eager statistician will help you and your studies succeed.

SUMMARY QUESTIONS 1. In the Women’s Health Initiative (WHI) Clinical Trial of Calcium and Vitamin D Supplementation, 36,382 postmenopausal women aged 50e79 were

4.

5. 6.

7.

randomized to either 1000 mg of calcium with 400 International Units (IU) of vitamin D3 daily or placebo and followed for an average of 7 years. Investigators were interested in whether supplementation of calcium and vitamin D would reduce hip fractures. State the null and alternative hypothesis. Jackson et al.20 on behalf of the WHI investigators, reported the results of the Calcium and Vitamin D trial described in problem 1. a. The hip bone density was 1.06 percent higher in the supplementation group compared to placebo, p ¼ 0.01. Interpret this result. Was the null hypothesis rejected? b. These authors also reported the ratio of hip fractures for the supplement to placebo group. The intention-to-treat (ITT) analysis reported the rate ratio (95% CI) as 0.88 (0.72, 1.08), whereas for adherent women only the rate ratio was 0.71 (0.52, 0.97). Interpret these two results. Hint: the ratio of two numbers that are equal is 1. Suppose investigators were interested in comparing the efficacy of two different antibiotic regimens for the treatment of sepsis in a randomized phase II trial. The primary outcome is the binary outcome of 30-day survival (yes/no). State the null and alternative hypothesis. Define the following terms: a. p-value b. type I error c. type II error Why is hypothesis testing an important part of any analysis from a phase III clinical trial? Does failing to reject H0 mean that H0 is true? a. Yes b. No, failing to reject the null hypothesis means there is insufficient evidence to reject the null hypothesis. The null hypothesis for a protocol is the mean systolic blood pressure in the control group is equal (the same) as the mean systolic blood pressure in the experimental intervention group. Which one of the following statements is FALSE? a. The p-value is the probability that the null hypothesis is true b. The p-value is a measure of the strength of evidence in the data that the null hypothesis is not true c. The p-value is the probability that data generated under the null hypothesis will produce a test statistic as extreme or more extreme than the value we actually observed d. The p-value is the probability of observing a difference between the groups in the samples’ mean systolic blood pressures as large as or larger than that observed if the two groups have the same true mean systolic blood pressure

II. STUDY DESIGN AND BIOSTATISTICS

REFERENCES

Acknowledgments The authors wish to thank Paul S. Albert for his contributions to the earlier versions of this chapter in previous editions of this book. His worked examples and dedication to training live on in this text.

Disclaimers This chapter reflects the views of the author and should not be construed to represent FDA’s views or policies. The findings and conclusions in this chapter are those of the authors and do not necessarily represent the official position of the Centers for Disease and Control Prevention.

References 1. Stone LA, Frank JA, Albert PS, et al. Characterization of MRI response to treatment with interferon beta lb: contrast enhancing MRI lesion frequency as a primary outcome measure. Neurology 1997;49:862e9. 2. Theodore WH, Albert P, Stertz B, et al. Felbamate monotherapy: implications for antiepileptic drug development. Epilepsia 1995; 36:1105e10. 3. ISIS-4 Collaborative Group. ISIS-4: a randomized factorial trial assessing early oral captopril, oral mononitrate, and intravenous magnesium sulphate in 58,050 patients with suspected acute myocardial infarction. Lancet 1995;345:669e85. 4. Piantadosi S. Clinical trials: a methodologic perspective. 2nd ed. Hoboken, NJ: John Wiley & Sons; 2005. 5. Altman DG. Practical statistics for medical research. Boca Raton, FL: Chapman & Hall; 1991. 6. Moore DS, Notz WI. Statistics: concepts and controversies. 5th ed. New York: Freeman; 2005. 7. Moore DS. Introduction to the practice of statistics. 5th ed. New York: Freeman; 2005.

357

8. Armitage P, Berry G, Matthews JNS. Statistical methods in medical research. Blackwell: Oxford; 2001. 9. Draper NR, Smith H. Applied regression analysis. 3rd ed. New York: Wiley; 1998. 10. Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934;26: 404e13. 11. Agresti A, Coull BA. Approximate is better than “exact” for interval estimation of binomial proportions. Am Stat 1998;52: 119e26. 12. Borkowf CB. Constructing confidence intervals for binomial proportions with near nominal coverage by adding a single imaginary failure or success. Stat Med 2006;25:3679e95. 13. Agresti A. Categorical data analysis. 2nd ed. Hoboken, NJ: Wiley; 2002. 14. Friedman LM, Furberg CD, DeMets DL. Fundamentals of clinical trials. 4th ed. New York: Springer; 2010. 15. Hochberg Y, Tamhane AC. Multiple comparison procedures. New York: Wiley; 1987. 16. van Belle G, Fisher LD, Heagerty PJ, Lumley TS. Biostatistics: a methodology for the Health sciences. 2nd ed. New York: Wiley; 2004. 17. Chan A-W, Tetzlaff JM, Gøtzsche PC, Altman DG, Mann H, Berlin J, Dickersin K, Hro´bjartsson A, Schulz KF, Parulekar WR, Krleza-Jeric K, Laupacis A, Moher D. SPIRIT 2013 explanation and elaboration: guidance for protocols of clinical trials. BMJ 2013;346:e7586. 18. U.S. Department of Health and Human Services Food and Drug Administration Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER). Multiple endpoints in clinical trials: guidance for industry. Draft. January 2017. http://www.fda.gov/ucm/groups/fdagovpublic/@fdagov-drugs-gen/documents/document/ucm536750. pdf. 19. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Statistician 2016;70:129e33. 20. Jackson RD, LaCroix AZ, Gass M. Calcium plus vitamin D supplementation and the risk of fractures. NEJM 2006;357:669e83.

II. STUDY DESIGN AND BIOSTATISTICS

C H A P T E R

25 Power and Sample Size Calculations 1

Craig B. Borkowf1, Laura Lee Johnson2, Paul S. Albert3

Centers for Disease Control and Prevention, Atlanta, GA, United States; 2U.S. Food and Drug Administration, Silver Spring, MD, United States; 3National Institutes of Health, Rockville, MD, United States

O U T L I N E Introduction Basic Concepts Notational Conventions Review of the Normal and t-Distributions

Sample Size Calculations for Precision in Confidence Interval Construction 361 Confidence Intervals for Means of Continuous Data 361 Confidence Intervals for Binomial Proportions 362 Sample Size Calculations for Hypothesis Tests: One Sample of Data Calculations for Continuous Data Regarding a Single Population Mean Calculations for Binary Data Regarding a Single Population Proportion Two-Stage Designs for a Single Population Proportion

362 362 363 363

Sample Size Calculations for Hypothesis Tests: Paired Data 364 Calculations for Paired Continuous Data 364 Calculations for Paired Binary Data 365 Sample Size Calculations for Hypothesis Tests: Two Independent Samples 366

INTRODUCTION This chapter introduces several fundamental concepts related to power and sample size calculations. We first review some key questions that should be considered when developing research studies. We then introduce the concept of statistical power and explain why having adequate power is essential for designing successful studies. Next, we present some basic sample size Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00025-3

Calculations for Continuous Data With Equal Variances and Equal Sample Sizes 366 Calculations for Continuous Data With Unequal Variances or Unequal Sample Sizes 367 Calculations for Two Independent Samples of Binary Data 367

359 360 360 361

Advanced Methods and Other Topics Alternative Statistics and Sample Size Calculation Methods Several Advanced Study Designs Retention of Subjects Statistical Computing

368

Conclusion

369

Exercises

370

Acknowledgments

371

Disclaimers

371

References

371

368 368 369 369

formulas for when we plan to collect a sample of either continuous or binary data and then wish to construct a confidence interval for the population mean or the population proportion with a certain degree of precision. Subsequently, we present some basic sample size formulas for when we plan to collect one sample, a paired sample, or two independent samples of either continuous or binary data and then wish to test hypotheses about specific characteristics of the populations from which the data

359

Copyright © 2018. Published by Elsevier Inc.

360

25. POWER AND SAMPLE SIZE CALCULATIONS

came. Finally, we discuss several advanced topics related to sample size calculations and the collaborative process of study design.

Basic Concepts One of the most common yet challenging collaborations between statisticians and researchers is the sample size calculation. They must work together to gather established knowledge and to elicit beliefs about the research questions to construct reasonable hypotheses about what future data may show. Some typical questions are as follows: What are the study groups or treatment arms of interest (e.g., new treatment, placebo)? What is the expected distribution or type of data to be collected (e.g., bell-shaped, continuous, count, ordinal, categorical, binary)? What are the anticipated means, medians, and ranges of the future data? What is the variation in repeated measurements for each subject, and what is the variation between measurements for different subjects? How much does it cost to set up the study and then to enroll each additional subject or site? What are the maximum numbers of subjects and sites that can realistically be enrolled in the study? What are the consequences of a Type I error, namely, rejecting the null hypothesis (H0) when it is indeed true? Conversely, what are the consequences of a Type II error, namely, failing to reject the null hypothesis when it is false and instead a particular alternative hypothesis (H1) is true? The desire to control the probability of making either a Type I or a Type II error leads to the concept of statistical power. Recall that hypothesis tests are designed so that, for a particular test statistic (e.g., z-test, t-test), the probability of making a Type I error, or the significance level, denoted a, is held at some fixed value. Sample size calculations are designed so that, for the same test, the probability of making a Type II error, denoted b, for a particular alternative hypothesis also is held at some fixed value. In turn, (1  b) equals the statistical power of a particular test, which is the probability of rejecting the null hypothesis when that alternative hypothesis is indeed true. When designing studies, we need to consider the statistical power because power indicates the chance of detecting a statistically significant effect (of any magnitude) when in truth an effect of a certain magnitude exists. Studies with low power are unlikely to answer key scientific questions with sufficient precision or to produce statistically significant results even when meaningful effects do exist and are thus an inefficient use of precious scientific resources. It is important to remember that the absence of evidence for an effect is not the same as evidence for the absence of an effect.1 Although there is no hard-and-fast rule,2 the consensus is that it is

desirable to have at least 80% power to test credible scientifically or clinically meaningful hypotheses.3,4 One can take various approaches to sample size and power calculations for a given set of null and alternative hypotheses, a chosen significance level (a) and a particular test statistic. Suppose we wish to design a parallelgroups trial in which subjects are randomly assigned to receive either a new treatment or a placebo. First, one may calculate power for a fixed sample size. For example, one may ask, “If we enroll 30 subjects, what is the power, or chance, to detect a statistically significant effect (of any magnitude) when the truth is that the new treatment produces a 20% reduction in the main outcome?” Second, one may calculate the required sample size for a fixed power. For instance, one may ask, “What is the required sample size in each of two groups to have 80% power to detect a statistically significant effect (of any magnitude) when the truth is that the new treatment produces a 20% reduction in the main outcome?” This chapter is focused on the latter approach, namely, estimating the minimum required sample size for a fixed power. When publishing the results of clinical and scientific research, it is standard practice to report how the sample size calculations were performed. For example, the CONSORT 2010 Statement (item 7a)5,6 requires authors to explain how the sample size was determined. This information enables the reader to evaluate the credibility of the null and alternative hypotheses, the appropriateness of the test statistics, and the plausibility of the design parameters, and thus whether the sample size and power calculations adequately reflect the balance of clinical, scientific, and statistical concerns. Note that many of the sample size formulas presented in this chapter are approximate, and some tend to underestimate the required sample size. To illustrate how well these more basic methods perform, we will periodically mention advanced methods that are beyond the scope of this chapter. It is advisable to consult a statistician about the best methods to use for a particular set of research questions. Furthermore, because sample size calculations depend heavily on the assumptions made by the investigators, it also is wise to perform a series of calculations under a variety of null and alternative hypotheses.

Notational Conventions One central scientific aim is to collect a sample of data from a population or group and then calculate sample statistics from those data to estimate the corresponding parameters, or characteristics, of that population or group. In this chapter, we use Greek letters to represent population parameters and their hypothesized values, such as the population mean (m), variance (s2), standard

II. STUDY DESIGN AND BIOSTATISTICS

SAMPLE SIZE CALCULATIONS FOR PRECISION IN CONFIDENCE INTERVAL CONSTRUCTION

deviation (s), or proportion (p). By contrast, we use Latin letters to represent sample statistics calculated from data, such as the sample mean (x), variance (s2), standard deviation (s), or proportion (p). In addition, we use subscripts to distinguish the parameters and the statistics that correspond to various hypotheses or study groups.

Review of the Normal and t-Distributions Recall that we often model continuous data (perhaps after an appropriate transformation) by the Normal (Gaussian) distribution, or a bell-shaped curve, with mean m and variance s2. In particular, the standard Normal distribution has a mean of zero and a variance of one. The z-statistic calculated for the z-test (described in the preceding chapter) follows this distribution. We use the symbol Zc to represent the c  100 percentile of the standard Normal distribution, such that c equals the proportion of this distribution less than Zc (or, equivalently, the fraction of area under the standard Normal curve to the left of Zc), where c ranges from 0 to 1. Some key percentiles of the standard Normal distribution are Z0.5 ¼ 0, Z0.8 ¼ 0.842, Z0.9 ¼ 1.282, Z0.95 ¼ 1.645, Z0.975 ¼ 1.960, and Z0.99 ¼ 2.326. Furthermore, because this distribution is symmetric about zero, Zc ¼ Z1c. Additional values can be readily obtained from tables in most introductory statistics textbooks (e.g., Altman7) and from most statistical software packages. By comparison, when the population variance is unknown, we use the sample variance to calculate the t-statistic for the t-test (described in the preceding chapter). The t-statistic follows the t-distribution, the shape of which depends on the degrees-of-freedom parameter, a function of the sample size. We use the symbol Tf,c to represent the c  100 percentile of the t-distribution with f degrees of freedom. For example, for selected values of f, the 97.5 percentiles of the tdistribution are T10,0.975 ¼ 2.228, T20,0.975 ¼ 2.086, T30,0.975 ¼ 2.042, T60,0.975 ¼ 2.000, and T100,0.975 ¼ 1.984. Note that as f increases, the c  100 percentiles of the t-distribution decrease in magnitude toward the corresponding percentile of the standard Normal distribution; hence Tf,c > Tfþ1,c > Zc for c > 0.5. Furthermore, because the t-distribution is symmetric about zero, Tf,c ¼ Tf,1c. Consult an introductory statistics textbook for more details about this important distribution.

SAMPLE SIZE CALCULATIONS FOR PRECISION IN CONFIDENCE INTERVAL CONSTRUCTION In this section, we consider sample size calculations for the construction of confidence intervals of a desired width or precision. These calculations depend only on the chosen significance level (a) but not on power.

361

Confidence Intervals for Means of Continuous Data First, we can calculate the sample size required to construct a confidence interval of width w for an unknown population mean. We assume that the data come from a population that is well approximated (possibly after some transformation) by a Normal distribution with unknown mean m and known variance s2. We may plan to collect a sample of data from this population and then calculate the sample mean x and the sample variance s2. Recall that we may construct a (1  a) 100% confidence interval for the population mean as follows:  pffiffiffi x  Z1a=2 s n : (25.1) For this confidence interval to have width w (or less), we need to solve for the smallest sample size n such that pffiffiffi (25.2) 2Z1a=2 s n  w; or, equivalently,8

 n  4Z21a=2 s2 w2 :

(25.3)

By comparison, if the variance is unknown, we may construct a (1  a) 100% confidence interval for the population mean using the t-distribution with (n  1) degrees of freedom and the sample standard deviation s:  pffiffiffi x  Tn1;1a=2 s n : (25.4) Since the sample standard deviation s needs to be calculated from the future data, we substitute a hypothesized value sh for s and solve numerically for the smallest sample size n such that pffiffiffi 2Tn1;1a=2 sh n  w; (25.5) or, equivalently,

 2 s2h w2 : n  4Tn1;1a

(25.6)

Assuming the hypothesized variance s2h is close to the true variance s2, the future confidence interval will have approximately the desired width. Example 1. Suppose a clinician wishes to estimate the mean serum albumin level in a specific population of patients with primary biliary cirrhosis of the liver. Because serum albumin is an important indicator of the synthetic function of the liver, she wants to obtain a tight confidence interval around the estimated mean. An earlier study found a mean of 35 g/L and standard deviation of 6 g/L. If she wishes to estimate the mean in this new population of patients with a 95% confidence interval (a ¼ 0.05) of width w ¼ 4 g/L, how many patients should she enroll in this study?

II. STUDY DESIGN AND BIOSTATISTICS

362

25. POWER AND SAMPLE SIZE CALCULATIONS

To construct a 95% confidence interval, we use Z1  a/2 ¼ 1.960. Using Eq. (25.3), we calculate . n ¼ 4ð1:960Þ2 ð6Þ2 ð4Þ2 ¼ 34:6z35: (25.7) Thus, the minimum required sample size in this study is n ¼ 35 patients. Furthermore, because the true variance of the new population is really unknown, she may plan to construct a 95% confidence interval using the t-distribution and the future sample standard deviation s. Solving Eq. (25.6) numerically, we obtain n ¼ 38, a slightly larger sample size. By way of comparison, using an advanced method beyond the scope of this chapter,9 we can strictly control the probability that the width of a confidence interval constructed using the t-distribution does not exceed a certain threshold, assuming that the true standard deviation does not exceed its hypothesized value, i.e., s  sh. In the above example, if s ¼ 6 g/L, the probability is only 50% that the width of a 95% confidence interval constructed using the t-distribution will be less than or equal to w ¼ 4 g/L for a sample size of n ¼ 38. This probability increases to 80%, 90%, and 95% for sample sizes of n ¼ 44, 47, and 50, respectively.

Confidence Intervals for Binomial Proportions Likewise, we can calculate the sample size required to construct a confidence interval of width w for an unknown binomial proportion p. Let p denote a sample proportion. Recall that we may construct a (1  a) 100% binomial confidence interval for a population proportion as follows: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi.pffiffiffi  p  Z1a=2 pð1 pÞ n : (25.8) Because the sample proportion p must be estimated from the future data, we substitute a hypothesized value of ph for p and solve for the smallest sample size n such that pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi.pffiffiffi 2Z1a=2 ph ð1 ph Þ n  w; (25.9)

desired width w. By way of comparison, a more advanced exact binomial method, beyond the scope of this chapter, shows that when the true proportions are p ¼ 0.1, 0.2, 0.3, 0.4, and 0.5, we need somewhat larger sample sizes of n ¼ 189, 291, 359, 395, and 402, respectively, for the probability to be 90% that the width of the 95% confidence interval will indeed equal w ¼ 0.1 or less. More generally, we can use any confidence interval formula to solve directly or numerically for the smallest sample size n that gives the interval of the desired width. Remember that when the width of an interval depends on sample statistics that need to be estimated from the future data (e.g., the sample standard deviation), the sample size formula depends on the hypothesized values of the corresponding population parameters. Thus, it is wise to consider a range of plausible parameter values when performing sample size calculations.

SAMPLE SIZE CALCULATIONS FOR HYPOTHESIS TESTS: ONE SAMPLE OF DATA In this section, we discuss sample size calculations in the context of testing hypotheses about the population mean or proportion when we plan to sample data from a single population. Recall that for hypothesis testing, the alternative hypothesis is the general negation of the null hypothesis (e.g., H0: m ¼ m0 vs. H1: m s m0). By contrast, for sample size calculations, a specific alternative hypothesis is required (e.g., H1: m ¼ m1), where the difference between the null and alternative hypotheses represents a scientifically or clinically meaningful difference (e.g., d ¼ m1  m0). Note that the expressions on the left side of each hypothesis are the unknown parameters about which we wish to make inference, whereas the expressions on the right side are the hypothesized values that we use in the sample size calculations. In the next two sections, we extend these basic principles to sample size calculations for paired data and two independent samples of data.

or, equivalently,8

 n  4Z21a=2 ph ð1 ph Þ w2 :

(25.10)

For example, to obtain a 95% confidence interval of width w ¼ 0.1 for true proportions p ¼ 0.1, 0.2, 0.3, 0.4, and 0.5, we need sample sizes of n ¼ 139, 246, 323, 369, and 385, respectively. By symmetry, results for p and (1  p) are the same. It is important to remember that these methods for binomial proportions are approximate and that depending on the sample proportion p actually obtained, the interval can be wider than the

Calculations for Continuous Data Regarding a Single Population Mean Consider the following null and alternative hypotheses about the mean m of a population that has (approximately) a Normal distribution with known variance s2: H0 : m ¼ m0 vs. H1 : m ¼ m1

(25.11)

Let d ¼ m1  m0 denote the scientifically or clinically meaningful difference, and let x denote the sample

II. STUDY DESIGN AND BIOSTATISTICS

SAMPLE SIZE CALCULATIONS FOR HYPOTHESIS TESTS: ONE SAMPLE OF DATA

mean. Suppose we plan to conduct a two-sided hypothesis test by using the z-statistic,  pffiffiffi (25.12) z ¼ ðx m0 Þ s n at the a significance level. To calculate the required sample size, we specify the power (1  b), and use the following formula10,11:  2  n ¼ Z1a=2 þ Z1b s2 d2 (25.13) In this and all other sample size formulas, if n is not an integer, it should be rounded up. For a one-sided test, replace Z1a/2 by Z1a. Eq. (25.13) shows that the required sample size increases as power (1  b) and the population variance (s2) increase and as the significance level (a) and the meaningful difference (d) decrease. Also, for a given significance level, sample sizes are smaller for one-sided hypothesis tests than for comparable two-sided hypothesis tests. Example 2. Patients with hypertrophic cardiomyopathy (HCM) have enlarged left ventricles (mean, 300 g) compared with the general population (mean, 120 g). A cardiologist studying a particular genetic mutation that causes HCM wishes to determine whether the mean left ventricular mass of patients with this particular mutation differs significantly from the mean for other patients with HCM. If the true difference equals or exceeds the meaningful difference of d ¼ 10 g in either direction, it is important to reject the null hypothesis of equality (m ¼ 300 g). If past laboratory measurements suggest that s ¼ 30 g and he chooses a 5% significance level (a ¼ 0.05) and a power of 90% (b ¼ 0.1), what sample size does he need? This hypothesis is two-sided, so Z1ea/2 ¼ 1.960 and Z1eb ¼ 1.282. Using the previous formula, we calculate . n ¼ ð1:960þ 1:282Þ2 ð30Þ2 ð10Þ2 ¼ 94:6z95 (25.14) Thus, the minimum required sample size is n ¼ 95 subjects for this study. By contrast, if the population variance is unknown, as in the case where prior studies of that population have not been conducted, we must plan to use a t-test (with the sample standard deviation s) rather than a z-test (with the population standard deviation s). Because the t-distribution is wider than the standard Normal distribution, the above calculation based on the Normal percentiles Z1ea/2 and Z1eb (instead of their t-distribution counterparts) underestimates the required sample size. Hence, the convention is to increase the sample size slightly to compensate for this underestimation.12 By comparison, if the cardiologist in Example 2 plans to use a t-test (with the sample standard deviation s), the minimum required sample size is n ¼ 97.

363

Calculations for Binary Data Regarding a Single Population Proportion Next, consider the following null and alternative hypotheses about the proportion p of a population that has some binary characteristic: H0 : p ¼ p0 vs. H1 : p ¼ p1 :

(25.15)

Let d ¼ p1  p0 denote the scientifically or clinically meaningful difference, and let p denote the sample proportion. Suppose we plan to conduct a two-sided hypothesis test by using the z-statistic, .pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi z ¼ ðp p0 Þ p0 ð1 p0 Þ=n; (25.16) at the a significance level. To calculate the required sample size, we specify the power (1  b), and use the following formula13: h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i2 . 2 n ¼ Z1a=2 p0 ð1 p0 Þ þ Z1b p1 ð1 p1 Þ d : (25.17) Example 3. Suppose an oncologist wishes to conduct a Phase II (safety/efficacy) clinical trial to test a new cancer drug. If only 20% of patients will benefit from this drug, she does not wish to continue to study it because drugs with comparable efficacy are already available. Conversely, if at least 40% of patients will benefit from this drug, she wishes to have an 80% chance to reject the null hypothesis and consequently to continue to study the drug. Using a one-sided z-test at the 5% significance level (a ¼ 0.05) and 80% power (b ¼ 0.2), how many participants should she enroll in this clinical trial? This hypothesis is one-sided, so Z1ea ¼ 1.645 and Z1eb ¼ 0.842. The null proportion is p0 ¼ 0.2, the alternative proportion is p1 ¼ 0.4, and the difference is d ¼ 0.2. Using the previous formula, we calculate h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i2 . n ¼ 1:645 ð0:2Þð0:8Þ þ 0:842 ð0:4Þð0:6Þ ð0:2Þ2 ¼ 28:6z29: (25.18) Thus, the minimum required sample size is n ¼ 29 patients for this single-arm clinical trial. Similarly, with 90% power (b ¼ 0.1, Z1eb ¼ 1.282), the required sample size is n ¼ 42. By way of comparison, an advanced exact binomial method gives required sample sizes of n ¼ 35 and 47 for 80% and 90% power, respectively.

Two-Stage Designs for a Single Population Proportion For early Phase II (safety/efficacy) trials, an alternative approach for testing hypotheses about a single population proportion is to use a two-stage design.10,14 In a

II. STUDY DESIGN AND BIOSTATISTICS

364

25. POWER AND SAMPLE SIZE CALCULATIONS

two-stage design, one sequentially enrolls up to a certain number of subjects in a first stage, and then, if the evidence is sufficiently promising, one sequentially enrolls up to a certain number of subjects in a second stage. If the evidence is not sufficiently promising during either the first or the second stage, one may stop enrollment immediately without completing that stage. For selected sets of hypotheses, significance levels, and powers, tables in the above references indicate the maximum number of subjects to enroll in each stage and the corresponding stopping rules. These rules are chosen so that “optimal” two-stage designs have the smallest expected, or average, sample size under the null hypothesis. Example 4. Consider a two-stage trial to test the same hypotheses as in the preceding example, namely H0: p ¼ 0.2 versus H1: p ¼ 0.4, with a 5% significance level and 80% power. In the first stage, one would enroll up to m1 ¼ 13 participants sequentially, and if r1 ¼ 3 or fewer study participants respond positively to the drug, one should stop enrollment and abandon the drug. In the second stage, one would enroll up to 30 additional participants sequentially, for a maximum of m2 ¼ 43; then, if r2 ¼ 12 or fewer study participants respond, one should abandon the drug, whereas if 13 or more participants respond, the drug should be considered for further study. With a 5% significance level and 80% power, if the null hypothesis is true (i.e., p ¼ 0.2), one would need to enroll, on average, 21 participants in the trial (i.e., substantially less than n ¼ 29 or m2 ¼ 43) to conclude that the drug should be abandoned. By comparison, with a 5% significance level and 90% power (m1 ¼ 19, r1 ¼ 4, m2 ¼ 54, r2 ¼ 15), if the null hypothesis is true, one would need to enroll, on average, 30 participants in the trial (i.e., substantially less than n ¼ 42 or m2 ¼ 54) to reach the same conclusion.

SAMPLE SIZE CALCULATIONS FOR HYPOTHESIS TESTS: PAIRED DATA In this section, we discuss sample size calculations for hypothesis tests about characteristics of paired data. The pairing may result from measuring the same characteristics of each subject before and after treatment or from taking similar measurements of opposite paired limbs or organs of the same subject (e.g., arms, legs, eyes, ears). The pairing also may result from taking measurements of twins or two persons with similar or “matched” characteristics, one of whom is assigned to the treatment group and the other to the control group. Note that the methods appropriate for paired data more

closely resemble those for one sample of data (see the preceding section) than those for two independent samples (see the next section).

Calculations for Paired Continuous Data Suppose we plan to collect two measurements, x and y, of each subject or pair of subjects. We assume that the measurements (x, y) come from populations with means mx and my and variances s2x and s2y , respectively. Let d ¼ y  x for each subject. We assume that the difference d has (approximately) a Normal distribution with unknown mean md and known variance s2d . Consider the following null and alternative hypotheses about the mean md: H0 : md ¼ m0 vs. H1 : md ¼ m1 :

(25.19)

Let d ¼ m1  m0 denote the scientifically or clinically meaningful difference, and let d ¼ y  x denote the sample difference in means. Suppose we plan to conduct a two-sided hypothesis test by using the paired z-statistic,   pffiffiffi z ¼ d m0 sd n ; (25.20) at the a significance level. To calculate the required sample size, we specify the power (1 e b), and use the following formula:15  2  n ¼ Z1a=2 þ Z1b s2d d2 : (25.21) Observe that Eq. (25.21) resembles Eq. (25.13) for sample size calculations for one sample of continuous data. When the sum of variances of the two original measurements are larger than the variance of the difference  i.e.; s2x þ s2y > s2d , a design based on a paired sample

requires a smaller sample size than a design based on two independent samples. Example 5. Suppose an investigator wishes to design a pilot study to investigate the effect of a new medication on diastolic blood pressure in hypertensive patients. He plans to take two measurements of each subject, one measurement at baseline when the subject has not yet taken the medication (x), followed by a second measurement when the subject has been taking the medication for 12 weeks (y). He then plans to compute the difference between these measurements for each subject (d). Past laboratory measurements suggest that the standard deviations of the original measurements are sx ¼ sy ¼ 20 mm Hg, respectively. The investigator wishes to perform a two-sided paired z-test at the 5% significance level (a ¼ 0.05) regarding whether there is a change in average diastolic blood pressure on the new medication. He wants a 90% chance to reject the

II. STUDY DESIGN AND BIOSTATISTICS

SAMPLE SIZE CALCULATIONS FOR HYPOTHESIS TESTS: PAIRED DATA

null hypothesis of equality if the true difference is d ¼ 3 mm Hg in either direction (90% power, b ¼ 0.1). If past measurements suggest that the standard deviation of the difference is sd ¼ 15 mm Hg, what sample size does he need? This hypothesis is two-sided, so Z1ea/2 ¼ 1.960 and Z1eb ¼ 1.282. Using the previous formula, we calculate . n ¼ ð1:960þ 1:282Þ2 ð15Þ2 ð3Þ2 ¼ 262:7 z 263: (25.22) Thus, the minimum required sample size is n ¼ 263 subjects for this paired-sample study. Remember that sample size calculations depend heavily on the particular parameters chosen, so it is wise to repeat these calculations under different assumptions. Suppose the investigator can reduce the within-subject variation by taking three repeated measurements of each subject while not taking medication and while taking the new medication. If the standard deviation of the difference in the means of the three repeated measurements is sd ¼ 12 mm Hg, what sample size does he need? Using Eq. (25.21), we calculate . n ¼ ð1:960þ 1:282Þ2 ð12Þ2 ð3Þ2 ¼ 168:2 z 169: (25.23) Thus, the minimum required sample size is n ¼ 169 subjects, about two-thirds of the previous sample size calculation. By comparison, if the meaningful difference is instead d ¼ 6 mm Hg, what sample size does he need? Using Eq. (25.21), we calculate . n ¼ ð1:960þ 1:282Þ2 ð12Þ2 ð6Þ2 ¼ 42:0 z 43: (25.24) Thus, the minimum required sample size is n ¼ 43 subjects, about one-fourth of the previous sample size calculation. Note that this smaller sample size comes with an implicit price, namely, the decreased ability to detect differences smaller than this new meaningful difference. The design used in this example is a nonrandomized baseline-versus-treatment design, which may be subject to various problems, such as regression to the mean over time (e.g., participants who were selected because they were especially sick at screening may naturally progress to a less severe state, even in the absence of treatment efficacy), bias due to external trends over time (e.g., seasonal changes may affect participants in an asthma study), and investigator bias in evaluating treatment efficacy due to knowing when each participant took the active treatment (i.e., treatment assignments were not masked). An alternative design that mitigates these problems is the parallel-groups design, with or without baseline measurements (see “Calculations for continuous data with equal variances and equal sample sizes,” below).

365

Calculations for Paired Binary Data Next, consider the case of paired binary data (a, b). Such data may arise, for example, in a study in which the same condition affects both eyes and each eye is given a different treatment, labeled A and B. When outcomes for each treatment are coded as 0 (i.e., no, failure) and 1 (i.e., yes, success), there are four possible joint outcomes (0, 0), (0, 1), (1, 0), and (1, 1) with corresponding population proportions p00, p01, p10, and p11, respectively (where p00 þ p01 þ p10 þ p11 ¼ 1). The success rate of treatment A is p10 þ p11, whereas the success rate of treatment B is p01 þ p11. In turn, the difference between the success rates of these treatments is p10  p01. One possible approach for paired binary data is to test whether the difference in success rates p10  p01 equals zero (i.e., both treatments are equally effective) or a specific meaningful difference d (i.e., treatment A is better than treatment B if d > 0), conditional on the sum of the two discordant joint outcomes p01 þ p10 being equal to q. This approach translates into the following null and alternative hypotheses about the discordant difference p10  p01: H0 : p10  p01 ¼ 0 vs. H1 : p10  p01 ¼ d;

(25.25)

given p10 þ p01 ¼ q: Let p00, p01, p10, and p11 denote the sample proportions in the four joint outcome categories. Suppose we plan to conduct a two-sided hypothesis test by using the z-statistic corresponding to McNemar’s test, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi p10 þ p01 : (25.26) z ¼ nðp10  p01 Þ at the a significance level. To calculate the required sample size, we specify the power (1  b), and use the following formula:15,16  pffiffiffiffiffiffiffiffiffiffiffiffiffi 2 . pffiffiffi n ¼ Z1a=2 q þ Z1b q  d2 (25.27) d2 : Example 6. Suppose a pediatrician wishes to determine whether the right-handed children in her practice are more likely to get infections in their right ear than in their left ear over the course of a year. The four possible joint outcomes are (0, 0), no ear infections; (0, 1), at least one infection in the right ear only; (1, 0), at least one infection in the left ear only; and (1, 1), at least one infection in each ear, where infections may occur concurrently or sequentially. She observes that, in a typical year, about 50% of infants never have an ear infection, 40% have at least one infection in either their left or right ear (but not both ears), and 10% have at least one infection in each ear. The null hypothesis is that among right-handed children, 20% have at least one infection in their left ear only, and 20% have at least one

II. STUDY DESIGN AND BIOSTATISTICS

366

25. POWER AND SAMPLE SIZE CALCULATIONS

infection in their right ear only. The alternative hypothesis chosen by the pediatrician is that 18% have at least one infection in their left ear only, and 22% have at least one infection in their right ear only (q ¼ 0.40, d ¼ 0.04). She wants to perform a one-sided McNemar’s test at the 5% significance level (a ¼ 0.05). If she wants a 90% chance to reject the null hypothesis of equality when the alternative hypothesis is true (90% power, b ¼ 0.1), how many medical records should she review for this study? This hypothesis is one-sided, so Z1ea ¼ 1.645 and Z1eb ¼ 1.282. Using the previous formula, we calculate  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2

pffiffiffiffiffiffiffi n ¼ ð1:645Þ 0:4 þ ð1:282Þ 0:4  ð0:04Þ2 ð0:04Þ2 ¼ 2138:1 z 2139: (25.28) Thus, the minimum required sample size is n ¼ 2139 right-handed children for this paired binary data study.

SAMPLE SIZE CALCULATIONS FOR HYPOTHESIS TESTS: TWO INDEPENDENT SAMPLES In this section, we discuss sample size calculations for hypothesis tests based on two independent samples when we wish to estimate the difference of two population means or proportions. For continuous data, we consider populations with equal and unequal variances and groups with equal and unequal sample sizes.

Calculations for Continuous Data With Equal Variances and Equal Sample Sizes Suppose we plan to collect the same number of measurements from two independent populations or groups, labeled X and Y, with (approximately) Normal distributions with unknown means mx and my and common known variance s2x ¼ s2y ¼ s2c . Consider the following null and alternative hypotheses about the difference in the population means: H0 : my  mx ¼ d0 vs. H1 : my  mx ¼ d1 :

(25.29)

Let d ¼ d1  d0 denote the scientifically or clinically meaningful difference. Suppose we plan to conduct a two-sided hypothesis test by using the z-statistic, . pffiffiffiffiffiffiffiffi  z ¼ ½ðx yÞ d0  sc 2=n ; (25.30) at the a significance level. To calculate the required common sample size in each group, ne, we specify the power (1  b), and use the following formula15:  2  (25.31) ne ¼ 2 Z1a=2 þ Z1b s2c d2 :

In turn, the total sample size equals twice the sample size in each group, i.e., n ¼ 2ne. Example 7. Let us revisit the scenario considered in Example 5. Suppose an investigator wishes to design a pilot study to investigate the effect of a new medication on diastolic blood pressure in hypertensive patients by using a parallel-groups design. He plans to randomly assign patients to receive either the treatment or a placebo. Measurements will be collected at baseline and after 12 weeks’ follow-up. The investigator plans to use a two-sided z-test to determine whether the change in the treatment group is different from that in the placebo group at the 5% significance level (a ¼ 0.05). He wants a 90% chance to reject the null hypothesis of equality if the true difference is d ¼ 3 mm Hg in either direction (90% power, b ¼ 0.1). If past measurements suggest that the common standard deviation of the change in both groups is sc ¼ 15 mm Hg, what sample size does he need for each group? This hypothesis is two-sided, so Z1ea/2 ¼ 1.960 and Z1eb ¼ 1.282. Using the previous formula, we calculate . ne ¼ 2ð1:960þ 1:282Þ2 ð15Þ2 ð3Þ2 ¼ 525:6 z 526: (25.32) Hence, the minimum required sample size is ne ¼ 526 subjects for each group, and thus n ¼ 2ne ¼ 1052 subjects in total. As before, suppose the investigator can reduce the variation by taking three repeated measurements of each subject while not taking medication and while taking either the new medication or a placebo. If the standard deviation of the difference in the means of the three repeated measurements is sc ¼ 12 mm Hg, what sample size does he need for each group? Using Eq. (25.31), we calculate . ne ¼ 2ð1:960þ 1:282Þ2 ð12Þ2 ð3Þ2 ¼ 336:3 z 337: (25.33) Hence, the minimum required sample size is ne ¼ 337 subjects for each group, about two-thirds of the previous sample size calculation, and thus n ¼ 2ne ¼ 674 subjects in total. By comparison, if the meaningful difference is instead d ¼ 6 mm Hg, what sample size does he need for each group? Using Eq. (25.31), we calculate . ne ¼ 2ð1:960þ 1:282Þ2 ð12Þ2 ð6Þ2 ¼ 84:1 z 85: (25.34) Hence, the minimum required sample size is ne ¼ 85 subjects for each group, about one-fourth of the previous sample size calculation, and thus n ¼ 2ne ¼ 170 subjects in total. The randomized parallel-groups design presented here requires four times as many subjects as the nonrandomized baseline-versus-treatment design with comparable design parameters (presented in “Calculations for Paired Continuous Data,”). The advantage of

II. STUDY DESIGN AND BIOSTATISTICS

SAMPLE SIZE CALCULATIONS FOR HYPOTHESIS TESTS: TWO INDEPENDENT SAMPLES

randomized designs is that they avoid many of the biases inherent in nonrandomized designs, such as investigator bias due to the lack of masking of treatment assignments. One strategy, often used in cancer research, is to screen many potential treatments on a small number of patients by using nonrandomized designs and then to test the most promising treatments more definitively by using randomized designs. This strategy makes efficient use of limited resources and potentially minimizes the exposure of patients to toxic, but ineffective, treatments.

Calculations for Continuous Data With Unequal Variances or Unequal Sample Sizes We can also calculate the required sample size for testing hypotheses about the difference between two population means when the variances in the two groups, s2x and s2y , are unequal. Suppose we plan to conduct a two-sided hypothesis test by using the z-statistic,

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  . s2x þ s2y ne ; z ¼ ½ðx yÞ d0  (25.35) at the a significance level. To calculate the required common sample size in each group, we use the following formula:11  2  . ne ¼ Z1a=2 þ Z1b (25.36) s2x þ s2y d2 : In some situations we may wish to design trials with different numbers of subjects in each group. For example, in placebo-controlled parallel-groups trials, we may wish to randomly assign a larger proportion of subjects to the new treatment than to the placebo. Let l ¼ ny =nx denote the desired ratio of sample sizes in the two groups. Suppose we plan to conduct a two-sided hypothesis test by using the z-statistic,

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   . ffi z ¼ ½ðx yÞ d0  s2x nx þ s2y ny ; (25.37) at the a significance level. To calculate the required sample size in the x-group, we use the following formula:11  2  . . nx ¼ Z1a=2 þ Z1b s2x þ s2y l d2 : (25.38) In turn, ny ¼ lnx . Note that the choice of l ¼ sy =sx minimizes the total sample size, which is n ¼ nx þ ny. Example 8. Continuing the preceding example, suppose the standard deviation in the new medication (treatment) group is 16 mm Hg and the standard deviation in the placebo group is 8 mm Hg, and all other design parameters are the same (d ¼ 3 mm Hg, a ¼ 0.05, b ¼ 0.1). Using Eq. (25.36), we calculate

i. ne ¼ ð1:960þ 1:282Þ2 ð16Þ2 þ ð8Þ2 ð3Þ2

367

h

¼ 373:7z374:

(25.39)

Thus, the minimum required sample size is ne ¼ 374 subjects for each group, and thus n ¼ 2ne ¼ 748 subjects in total. Next, suppose we wish to enroll twice as many subjects in the treatment group as in the placebo group. Let l ¼ 0.5. Using Eq. (25.38), we calculate h i nx ¼ ð1:960 þ 1:282Þ2 ð16Þ2 þ ð8Þ2 =ð0:5Þ =ð3Þ2 ¼ 448:5z449:

(25.40)

In turn, ny ¼ 225. Hence, the required sample sizes in the treatment and placebo groups are nx ¼ 449 and ny ¼ 225 subjects, respectively, and thus n ¼ nx þ ny ¼ 674 subjects in total.

Calculations for Two Independent Samples of Binary Data Next, we consider sample size calculations for comparing two independent binomial proportions calculated from binary data. Suppose we plan to sample equal numbers of binary data from two populations or groups, labeled A and B, with underlying population proportions pa and pb, respectively. One possible approach is to test whether the difference between the proportions pb  pa equals zero (i.e., both proportions are the same) or whether the proportions have specific alternative values pa ¼ qa and pb ¼ qb , where d ¼ qb  qa equals the scientifically or clinically meaningful difference. This approach translates into the following null and alternative hypotheses about the difference in the population proportions, pb  pa: H0 : pb  pa ¼ 0 vs. H1 : pb  pa ¼ d;

(25.41)

where d ¼ qb  qa : Let pa and pb denote the sample estimates of pa and pb,  respectively, and let p ¼ ðpa þ pb Þ 2 denote their average. Also, the difference between the sample proportions is pb  pa . Suppose we plan to conduct a twosided hypothesis test by using the z-statistic,

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2pð1 pÞ=n: z ¼ ðpb  pa Þ (25.42) at the a significance level. For computational convenience, let q ¼ ðqa þ qb Þ=2 denote the average of the   two alternative proportions, and let u0 ¼ 2q 1 q and u1 ¼ qa ð1 qa Þ þ qb ð1 qb Þ denote the variances of pb  pa under the null and alternative hypotheses,

II. STUDY DESIGN AND BIOSTATISTICS

368

25. POWER AND SAMPLE SIZE CALCULATIONS

respectively. To calculate the required common sample size in each group, we specify the power (1  b), and use the following formula:13  pffiffiffiffiffiffi pffiffiffiffiffiffi 2 . 2 ne ¼ Z1a=2 u0 þ Z1b u1 (25.43) d : Example 9. Suppose an immunologist wishes to compare mumps vaccination rates in two communities, one affected by a mumps outbreak among male adolescents and one unaffected. The null hypothesis is that the vaccination rates in the two communities are equal. The particular alternative hypothesis of interest is that the vaccination rates in the affected and unaffected communities are 80% and 90%, respectively. The immunologist plans to use a two-sided z-test to determine whether the vaccination rates differ in the affected and unaffected communities at the 5% significance level (a ¼ 0.05). He wants a 95% chance to reject the null hypothesis of equality if the true difference is d ¼ 10% (95% power, b ¼ 0.05). How many unrelated male adolescents should he enroll in each community for this study? This hypothesis is two-sided, so Z1a/2 ¼ 1.960 and Z1b ¼ 1.645. First, we calculate the average vaccination rate in both populations, q ¼ (0.80 þ 0.90)/2 ¼ 0.85, and the variances of pb  pa under the null and alternative hypotheses, u0 ¼ 2(0.85) (0.15) ¼ 0.255, and u1 ¼ (0.80) (0.20) þ (0.90) (0.10) ¼ 0.25. Using the previous formula, we calculate h pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi i2 . ne ¼ ð1:960Þ 0:225 þ ð1:645Þ 0:25 ð0:10Þ2 (25.44) ¼ 328:4z329: Thus, the minimum required sample size is ne ¼ 329 unrelated male adolescents in each community, for a total of n ¼ 2ne ¼ 658 in the entire study. By way of comparison, using an advanced method based on Fisher’s exact test, the minimum required sample size is 343 unrelated male adolescents in each community, for a total of 686 in the entire study.

ADVANCED METHODS AND OTHER TOPICS In this section, we consider various advanced methods that are beyond the scope of this chapter to put the basic methods presented here in perspective. We also discuss issues related to subject retention and modern statistical computing so that the reader will be familiar with these topics and understand how they may apply to a particular research study.

Alternative Statistics and Sample Size Calculation Methods We have emphasized that many of the methods for sample size calculation presented in this chapter are

approximate and that more advanced methods may be available. For instance, for continuous data, there are numerical algorithms based on the t-distribution, and for binary data, there are exact methods based on the binomial distribution, which provide more accurate sample size calculations. Likewise, in addition to the methods based on means and proportions (and their differences) presented here, methods exist for many other parametric statistics (e.g., correlation coefficient, odds ratio) and nonparametric statistics (e.g., Fisher’s exact test, Wilcoxon signed rank test for paired data, Wilcoxon rank sum test for two independent samples), which may be better suited to answering a particular research question.8,11 Standard statistical software can easily perform many of these advanced calculations.

Several Advanced Study Designs It is important to be familiar with several types of study designs and analytic methods for which sample size calculations are more complicated and for which special methods are needed. Some of these designs are as follows: • Multiarm clinical trials.17 Multiarm trials are an efficient way to compare several active treatments with a common placebo. The more hypothesis tests performed, however, the greater the overall chance of making a false discovery, that is, rejecting some null hypothesis and thus concluding that a particular active treatment is better than the placebo, when in truth the new treatment is ineffective. The Bonferroni correction can be used to control the overall experiment-wise or family-wise error rate in a single trial. Specifically, the required sample size for each of k comparisons between an active treatment and the placebo can be calculated using a significance level of a/k, giving an experiment-wise or family-wise error rate not to exceed a. • Group sequential trials.18 In this class of designs, the clinical trial is divided into several stages with specified (usually equal) numbers of subjects to be enrolled during each stage. After each stage is completed, the data are analyzed, and a decision is made to continue the trial and collect more data or to stop the trial early for either clear efficacy (i.e., the study shows that the new treatment is remarkably successful and that further data are unlikely to reverse this decision) or futility (i.e., the study, even if continued to its planned end, is unlikely to produce a decision in favor of the new treatment). The required sample size for the entire study can be obtained by calculating the sample size for a regular design without interim analyses and then inflating this number, by using one of several common methods,11,18 to reflect the multiple times at which the data are scheduled to be examined. The actual number of subjects enrolled can be significantly

369

CONCLUSION

smaller than the maximum sample size if the study is stopped early for either efficacy or futility. • Survival analysis.8,11,15 Survival analysis methods aim to estimate the distribution of the time to a particular event (e.g., relapse, infection, death) in each study group, possibly adjusting for baseline covariates of interest. A special challenge is that some subjects may not experience an event; instead they may be censored at a particular time, after which they are no longer observed in the study. Survival data are often analyzed using KaplaneMeier methods, Cox proportional hazards regression, and parametric regression modeling. For survival analysis studies the required sample size depends on both the expected rate of events in each study group and the expected rate of censoring. • Group randomized trials (GRTs)19 and cluster randomization trials.20 In this class of designs, groups or clusters of subjects, rather than individual subjects, are randomly assigned to the various treatment groups, although measurements of individual subjects are still made. Subjects in the same group tend to be more similar than subjects in different groups. Let s2b denote the variance between the means of the different groups at baseline (the between-group variance), and let s2w denote the variance between subjects in the same group (the within-group variance). To account for the dependence between subjects in the same group, we compute the intraclass correlation (ICC), denoted rICC ¼ s2b

 2  sb þ s2w :

(25.45)

Then, to obtain the required sample size for a particular GRT (nGRT), the calculated sample size for the regular design without groups (n) must be inflated to reflect the common, or average, group size (m) and the ICC: nGRT ¼ nð1þ ðm 1ÞrICC Þ:

(25.46)

• Generalized linear models (GLMs) (e.g., regression, repeated measures, and longitudinal data models).21 Suppose we wish to model the outcome, or response (whether continuous or binary), as a function of the treatment assignment or the main risk factor, plus other baseline covariates of interest. Although we may plan to use GLM methods to analyze the future data, the reality is that sample size calculations for such methods tend to be quite complicated and, in general, explicit formulas do not exist. Therefore, to obtain approximate sample sizes, we tend in practice to perform sample size calculations appropriate for a simplified method of analysis (e.g., a t-test for two independent samples instead of a multiple linear regression model).

Retention of Subjects Regardless of the simplicity or complexity of the study design, a major challenge in almost any clinical trial or study is retention, especially when the followup period for each subject lasts for many months or even years. Some subjects drop out (e.g., because of treatment side effects) or are lost to follow-up (e.g., because they move out of the study area). Furthermore, some subjects do not adhere to their assigned treatment (e.g., persons assigned to the placebo group who take the active treatment, i.e., they “drop in” to the treatment group). The intention-to-treat principle, discussed in Chapter 23, is an analytic strategy for dealing with these likely challenges. In many studies it is prudent to plan for a 10% to 20% rate of retention loss. As a rule of thumb, Lachin22 suggested inflating the calculated sample size by a factor of 1/(1  r)2, where r is the combined rate of drop-out, loss to follow-up, and other nonadherence to the study protocol. This formula adjusts for the loss of subjects plus the bias that results when key characteristics of the lost subjects differ from those of the study group as a whole.

Statistical Computing Modern statistical computing has become a great asset for sample size calculations. Most simply, statistical software facilitates the rapid calculation of numerous sample sizes (or statistical powers) for testing various hypotheses under a variety of assumptions about the underlying population parameters. Results can be arranged in tables to show how the calculated sample size varies as a function of changes in these assumptions. Also, numerical methods can provide solutions to sample size calculations when explicit formulas do not exist. Furthermore, for complex and otherwise intractable problems, data can be simulated under the null and alternative hypotheses for various sample sizes, and the empirical rejection rates of the null hypothesis can be compared with the desired Type I and Type II error rates, respectively. In addition, to help optimize the choice of sample size for a particular study design, special software can be used to graph power as a function of sample size for various parameter choices.

CONCLUSION In this chapter we have introduced the concept of statistical power, a key element in sample size calculations. We have discussed some basic sample size formulas for precision in confidence interval construction and for hypothesis testing, for both continuous and binary data. For hypothesis testing, we have considered formulas

II. STUDY DESIGN AND BIOSTATISTICS

370

25. POWER AND SAMPLE SIZE CALCULATIONS

for when we plan to collect one sample, a paired sample, and two independent samples of data. In addition, we have mentioned several advanced topics related to the choice of test statistics and hypotheses, study design, subject retention, and statistical computing. Performing sample size calculations can be one of the most challenging aspects of study design, but when statisticians and researchers collaborate closely, this process can be highly rewarding. Collaboration provides the opportunity to review the current state of knowledge, to clarify research objectives and hypotheses, and in turn to refine study designs. We advocate early and close collaboration, because once major design decisions have been made, it may be difficult to correct problems that may be found later. The goal of collaboration is to design powerful clinical trials and studies with sufficiently large sample sizes that will have a good chance of providing strong evidence to answer scientifically and clinically meaningful questions.

5.

6.

EXERCISES 1. Confidence interval for a mean. Suppose an infectious disease specialist wishes to estimate the mean CD4 counts among a population of HIVinfected pregnant women before starting treatment. He expects the data to have (approximately) a Normal distribution with a mean of 500 cells/mm3 and a standard deviation of 50 cells/mm3. If he wishes to obtain a 95% confidence interval with a width of 20 cells/mm3 for the true mean, show that he should enroll at least n ¼ 97 subjects in this study. 2. Confidence interval for a binomial proportion. Suppose a hematologist wishes to estimate the prevalence of Factor V Leiden among patients treated for a deep vein thrombosis. On the basis of past studies, she expects this prevalence to be approximately 25%. If she wishes to obtain a 95% confidence interval with a width of 0.1 (on average) for this prevalence, show that she should enroll at least n ¼ 289 subjects in this study. 3. Power and parameter choices. Use Eq. (25.13) to show that Z1eb and thus power (1  b) increases as the sample size (n), the significance level (a), and the meaningful difference (d) increase, and as the population variance (s2) decreases. Power is also greater for one-sided hypothesis tests than for comparable . two-sided hypothesis tests. (Hint: pffiffiffiffiffi Z1b ¼ nd s  Z1a=2 :) 4. One sample, continuous data. Suppose a biochemist wishes to study homocysteine levels in

7.

8.

blood specimens from men older than 50 who have cardiovascular disease. The mean serum homocysteine level among these men is 14 mmol/L before treatment, and she wants an 80% chance to reject the null hypothesis of no change if the mean serum homocysteine level drops to 12 mmol/L after these men take folate tablets for 10 weeks. She plans to use a one-sided z-test at the 5% significance level. If the standard deviation is 4 mmol/L, show that she needs to enroll at least n ¼ 25 patients in this study. One sample, binary data. Suppose an endocrinologist wishes to design a Phase II trial to test a new drug to reduce fatigue in diabetic patients. If the new drug reduces symptoms in only 30% of patients, its effect is not sufficient to merit further development. If the new drug reduces symptoms in 50% of patients, he wants a 95% chance to reject the null hypothesis of insufficient effect. He plans to use a one-sided z-test at the 5% significance level. Show that he needs to enroll at least n ¼ 63 patients in this clinical trial. Two-stage designs. Using a six-sided die, simulate data under the null and alternative hypotheses in Example 4, namely H0: p ¼ 0.2 versus H1: p ¼ 0.4, with a 5% significance level and 80% power. First, under the null hypothesis, treat rolls of 1 as treatment success, 2e5 as treatment failure, and reroll 6s. This procedure gives a 20% success rate per subject. Second, under the alternative hypothesis, treat rolls of 1e2 as treatment success, 3e5 as treatment failure, and reroll 6s. This procedure gives a 40% success rate per subject. Using the stopping rules in Example 4, how many subjects in total are enrolled in these simulated trials under each hypothesis? Repeat each simulation five times and compare the totals obtained to the fixed sample size, n ¼ 29 (Example 3) and the maximum two-stage sample size, m2 ¼ 43 (Example 4). Paired continuous data. Suppose a nutritionist wishes to study the weight change among obese men (BMI  30) on a 16-week low-fat diet, complemented by daily exercise. Assume that the standard deviations of the before and after weights are both 25 kg, whereas the standard deviation of the difference is 15 kg. She plans to use a two-sided paired z-test at the 5% significance level. She wishes to have a 90% chance to reject the null hypothesis of equality when the true change in weight (in either direction) is 8 kg. Show that if she correctly uses the paired sample size formula, she obtains a minimum required sample size of n ¼ 37 subjects. Paired binary data. Suppose an obstetrician wishes to compare the rates of testing for HIV and

II. STUDY DESIGN AND BIOSTATISTICS

371

REFERENCES

hepatitis B among pregnant women arriving in labor at a large hospital. One blood specimen should be taken, but for various reasons (e.g., imminent labor, patient’s refusal), one or both of the tests may not be performed. The obstetrician is concerned about a significant missed opportunity: some patients receive a hepatitis B test but fail to receive an HIV test at the same time. He estimates that 30% of pregnant women receive both tests but that 50% of women receive only one of the two tests. The null hypothesis is that 25% of women receive only an HIV test, whereas 25% of women receive only a hepatitis B test. The alternative hypothesis is that 20% of women receive only an HIV test, whereas 30% of women receive only a hepatitis B test (q ¼ 0.50, d ¼ 0.10). The obstetrician wants to perform a one-sided McNemar ’s test at the 5% significance level. Show that if he wants a 90% chance to reject the null hypothesis of equality when the alternative hypothesis is true, he needs to review n ¼ 425 medical records for this study. 9. Two independent samples, continuous data. Suppose a psychologist wishes to design a randomized parallel-groups trial to compare the impact of white noise and classical music (e.g., Mozart) on the performance of college students on a 200 question problem-solving test. On the basis of her knowledge of past students, she expects the white noise group to have a bell-shaped distribution with a mean of 120 points and a standard deviation of 15 points. She plans to compare the performance of the students in each group with a two-sided z-test at the 5% significance level. She wants a 90% chance to reject the null hypothesis of equality between the two groups when the performance in the music group is better or worse by 10 points. (a) If the standard deviation for the music group is also 15 points, show that she should enroll ne ¼ 48 students in each group of this study (total ¼ 96). (b) If the standard deviation for the music group is 30 points, show that she should enroll ne ¼ 119 students in each group (total ¼ 238). (c) If the standard deviation for the music group is 30 points, and she wishes to enroll twice as many students in the music group as in the white noise group, show that she should enroll nwn ¼ 71 and nm ¼ 142 students in the white noise (wn) and music (m) groups, respectively (total ¼ 213). 10. Two independent samples, binary data. Suppose a geneticist wishes to study the prevalence of sickle cell trait among two geographically separated populations in sub-Saharan Africa. He wants a 90% chance to reject the null hypothesis of equality if the

true prevalence is 10% in one population and 25% in the other. He plans to use a two-sided z-test at the 5% significance level. Show that he should enroll ne ¼ 133 subjects in each population for this study (total ¼ 266).

Acknowledgments The authors wish to thank Timothy A. Green, Lillian S. Lin, Marie S. Morgan, Philip J. Peters, Travis H. Sanchez, and Ryan E. Wiegand for helpful comments on this chapter.

Disclaimers The findings and conclusions in this chapter are those of the authors and do not necessarily represent the official position of the Centers for Disease and Control Prevention. This chapter reflects the views of the authors and should not be construed to represent FDA’s views or policies.

References 1. Altman DG, Bland JM. Absence of evidence is not evidence of absence. Br Med J 1995;311(7003):485. 2. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Guidance for industry: E9 statistical principles for clinical trials. Fed Regist 1998;63(179):49583e98. 3. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. p. 56. 4. Lee SJ, Zelen M. Clinical trials and sample size considerations: another perspective. Stat Sci 2000;15(2):95e110. 5. Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Br Med J 2010;340. 698e702:c332. 6. Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. Br Med J 2010;340. 1e28:c869. 7. Altman DG. Practical statistics for medical research. Boca Raton, FL: Chapman & Hall/CRC; 1991. 8. Machin D, Campbell MJ, Tan SB, Tan SH. Sample size tables for clinical studies. 3rd ed. West Sussex, England: John Wiley & Sons Ltd.; 2009. 9. Beal SL. Sample size determination for confidence intervals on the population mean and on the difference between two population means. Biometrics 1989;45(3):969e77. 10. Piantadosi S. Clinical trials: a methodologic perspective. 2nd ed. Hoboken, NJ: John Wiley & Sons; 2005. 11. Chow S-C, Shao J, Wang H. Sample size calculations in clinical research. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC; 2008. 12. Guenther WC. Sample size formulas for normal theory T-tests. Am Statistician 1981;35(4):243e4. 13. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions. 3rd ed. Hoboken, NJ: John Wiley & Sons; 2003. 14. Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clin Trials 1989;10(1):1e10. 15. Friedman LM, Furberg CD, DeMets DL. Fundamentals of clinical trials. 4th ed. New York: Springer; 2010. 16. Connor RJ. Sample size for testing differences in proportions for the paired-sample design. Biometrics 1987;43(1):207e11.

II. STUDY DESIGN AND BIOSTATISTICS

372

25. POWER AND SAMPLE SIZE CALCULATIONS

17. Freidlin B, Korn EL, Gray R, Martin A. Multi-arm clinical trials of new agents: some design considerations. Clin Cancer Res 2008; 14(14):4368e71. 18. Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Boca Raton, FL: Chapman & Hall/CRC; 2000. 19. Murray DM. Design and analysis of group-randomized trials. New York: Oxford University Press; 1998.

20. Donner A, Klar N. Design and analysis of cluster randomization trials in health research. London: Arnold; 2000. 21. Chow S-C, Liu J-P. Design and analysis of clinical trials: concepts and methodologies. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc.; 2013. 22. Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Controlled Clin Trials 1981;2(2):93e113. Correction 2(4):337.

II. STUDY DESIGN AND BIOSTATISTICS

C H A P T E R

26 An Introduction to Survival Analysis Laura Lee Johnson U.S. Food and Drug Administration, Silver Spring, MD, United States

O U T L I N E Introduction

373

Features of Survival Data

374

Survival Function KaplaneMeier and Product-Limit Estimators Calculation and Formula for an Estimate Calculation of Variance Comparing Two Survival Functions Comparing Two Survival Functions at a Given Time Point Comparing Two Survival Functions Using the Whole Curve: Log-Rank Test Example 1: Chronic Active Hepatitis Study

375 375 375 376 377 377 377 378

This chapter introduces some commonly used statistical methods for the analysis of survival time data in medical research. Survival data consist of two pieces of information for each subject: the time under observation and the ultimate outcome at the end of that time. Analysis of survival time or time-to-event data is complicated because the follow-up length often is different for each participant, and the event of interest, such as myocardial infarction (MI), often is not observed in all the subjects by the end of the study. For those participants in whom the event of interest is not observed, what is known is that their survival times are longer than their time spent in the study, but their exact survival times are unknown. This chapter describes features of survival time data, defines the true or underlying survival function, and introduces the KaplaneMeier or product-limit estimator for the survival function. It also presents several approaches for comparing two survival curves, a summary

Principles and Practice of Clinical Research http://dx.doi.org/10.1016/B978-0-12-849905-4.00026-5

Stratified Log-Rank Test

379

Proportional Hazards Model Calculation and Formulas

379 379

Common Mistakes

380

Conclusion

380

Questions

381

Acknowledgments

381

Disclaimer

381

References

381

of stratified analysis methods, and Cox’s proportional hazards regression analysis.

INTRODUCTION In survival analysis the main interest focuses on the time taken for some dichotomous event to occur. Although the term survival is used, the event of interest is not limited to death or failure. It can be any dichotomous event, such as nonfatal MI, adverse events, computer crashes, or bursting of a balloon filling with air; essentially it can be any definable event. Several different fields have different names for similar types of analyses. Survival, time to event, and failure time analysis are common names in the medical world; reliability theory and reliability analysis are common names in engineering, and duration modeling and

373

Copyright © 2018. Published by Elsevier Inc.

9 8 Patient 5 6 7 4 3 2 1 0

5

10

15

Time (months)

Patient 5 6

7

8

9

10

FIGURE 26.1 Diagram of patient accrual and follow-up from the data from Table 26.1. Solid circles, uncensored observation; open circles, censored observation.

4

Survival time is defined as the time from some fixed starting point (time origin) to the onset of the event of interest. In experimental animal studies often the starting point is the same time for all subjects. In controlled clinical trials the starting point frequently is the actual time a participant enters the study, thus the starting point may vary for each participant. In epidemiology, the time origin may be birth, time of first exposure, or another point in time. There are two key features of survival data. First, the length of follow-up time varies among participants. For example, for a study with a fixed end date, participants entering the study later on would have shorter followup time than those entering the study earlier. Second, the event of interest is almost never observed in all subjects by the end of a study. The survival time is called censored if the event is not observed by the end of the study; this indicates the period of observation ended before the event occurred. This type of censoring is referred to as right censoring, the most common type of censoring in clinical studies. Censoring may occur for various reasons. One common reason is that the study ends before the event occurs. Such censoring is called administrative censoring. Other reasons for censoring include the withdrawal of participants from the study and loss of contact with participants who move out of the study area. Censoring for reasons unrelated to the outcome for each participant (i.e., the occurrence of the event of interest or not) is called independent censoring. In all the methods presented in this chapter, the assumption of independent censoring is required. The diagrams in Figs. 26.1 and 26.2 and Table 26.1 are commonly used to illustrate the features of survival data. In this example, patient accrual occurs during the first 6 months of the study. After that, participants are monitored for a minimum of 12 months. The total length of the study is 18 months. This is an example of a study in which the total possible follow-up time will vary among study participants, in this case between 12 and 18 months, based on when each participant enters the study. The patients accrued earliest are observed for the longest time. Fig. 26.1 illustrates the staggered entry of participants into the study during the 6-month accrual period. Many survival studies have this pattern of participant accrual. The standard statistical methods for survival analysis assume that those participants

3

FEATURES OF SURVIVAL DATA

2

duration analysis are among the synonyms found in sociology and economics. Nevertheless the questions being answered and the statistical methods and concepts are similar. For this chapter we will take the clinical research frame of reference and use the term “survival.”

10

26. AN INTRODUCTION TO SURVIVAL ANALYSIS

1

374

0

FIGURE 26.2

5 10 Survival Time (months)

15

Diagram of the survival times for Table 26.1.

who enter the study at any given time are a representative random sample of those in the population still at risk at that time. Furthermore, the assumption of population homogeneity over time is made, namely that the characteristics of the population available for sampling remain essentially constant over time, at least to a reasonable approximation. These assumptions are particularly important in choosing how to estimate the hazard function, discussed later in the chapter. Looking at Fig. 26.1 more closely, we see the first participant was recruited at the beginning of the study (time 0) and had an event at approximately month 10. The second participant also was recruited at the beginning of the study but censored at approximately month 11. The survival time for each participant is obtained by subtracting the time of entry into the study from either the time of the event or the time of last follow-up without observing the event. Fig. 26.2 provides a modified presentation of the data in Fig. 26.1, moving the lines so all the survival times start from time 0.

II. STUDY DESIGN AND BIOSTATISTICS

375

SURVIVAL FUNCTION

TABLE 26.1

Data From First Hypothetical Example

Patient No.

Time at Entry (Months)

Time at Death or Censoring (Months)

Dead (D) or Censored (C)

Survival Time (Months)

1

0.0

10.6

D

10.6

2

0.0

11.5

C

11.5

3

0.4

16.0

C

15.6

4

1.1

6.2

D

5.1

5

1.3

7.1

C

5.8

6

3.5

10.2

D

6.7

7

3.9

18.0

C

14.1

8

4.5

16.1

D

11.6

9

5.2

18.0

C

12.8

10

5.9

18.0

C

12.1

Fig. 26.2 illustrates the survival time of each participant and provides a simpler picture than the first figure for comparing the survival times among the participants. This figure would not necessarily be a better way to examine the data if the assumption of time homogeneity did not apply. Also note that even though the study could follow participants for a minimum of 12 months, only 4 of the 10 participants were followed for 12 or more months; two participants were censored and six others died prior to being followed for 12 months. For additional reading, similar examples can be found in introductory textbooks.1e4

SURVIVAL FUNCTION The survival function, denoted by S(t), is the probability of an individual surviving at least until time t, where 0  S(t)  1. If the survival function is known from theory or empirical observations, then we can use it to analyze the survival experience of a population at variou