Causal Inference in Python (5th Early Release)) 9781098140250

How many buyers will an additional dollar of online marketing bring in? Which customers will only buy when given a disco

769 182 9MB

English Pages 496 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Causal Inference in Python (5th Early Release))
 9781098140250

Table of contents :
Preface
Prerequisites
Outline
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
I. Fundamentals
1. Introduction to Causal Inference
What Is Causal Inference?
Why We Do Causal Inference
Machine Learning and Causal Inference
Association and Causation
The Treatment and the Outcome
The Fundamental Problem of Causal Inference
Causal Models
Interventions
Individual Treatment Effect
Potential Outcomes
Consistency and Stable Unit Treatment Values
Causal Quantities of Interest
Causal Quantities: An Example
Bias
The Bias Equation
A Visual Guide to Bias
Identifying the Treatment Effect
The Independence Assumption
Identification with Randomization
Key Ideas
2. Randomized Experiments and Stats Review
Brute-Force Independence with Randomization
An A/B Testing Example
The Ideal Experiment
The Most Dangerous Equation
The Standard Error of Our Estimates
Confidence Intervals
Hypothesis Testing
Null Hypothesis
Test Statistic
p-values
Power
Sample Size Calculation
Key Ideas
3. Graphical Causal Models
Thinking About Causality
Visualizing Causal Relationships
Are Consultants Worth It?
Crash Course in Graphical Models
Chains
Forks
Immorality or Collider
The Flow of Association Cheat Sheet
Querying a Graph in Python
Identification Revisited
CIA and the Adjustment Formula
Positivity Assumption
An Identification Example with Data
Confounding Bias
Surrogate Confounding
Randomization Revisited
Selection Bias
Conditioning on a Collider
Adjusting for Selection Bias
Conditioning on a Mediator
Key Ideas
II. Adjusting for Bias
4. The Unreasonable Effectiveness of Linear Regression
All You Need Is Linear Regression
Why We Need Models
Regression in A/B Tests
Adjusting with Regression
Regression Theory
Single Variable Linear Regression
Multivariate Linear Regression
Frisch-Waugh-Lovell Theorem and Orthogonalization
Debiasing Step
Denoising Step
Standard Error of the Regression Estimator
Final Outcome Model
FWL Summary
Regression as an Outcome Model
Positivity and Extrapolation
Nonlinearities in Linear Regression
Linearizing the Treatment
Nonlinear FWL and Debiasing
Regression for Dummies
Conditionally Random Experiments
Dummy Variables
Saturated Regression Model
Regression as Variance Weighted Average
De-Meaning and Fixed Effects
Omitted Variable Bias: Confounding Through the Lens of Regression
Neutral Controls
Noise Inducing Control
Feature Selection: A Bias-Variance Trade-Off
Key Ideas
5. Propensity Score
The Impact of Management Training
Adjusting with Regression
Propensity Score
Propensity Score Estimation
Propensity Score and Orthogonalization
Propensity Score Matching
Inverse Propensity Weighting
Variance of IPW
Stabilized Propensity Weights
Pseudo-Populations
Selection Bias
Bias-Variance Trade-Off
Positivity
Design- Versus Model-Based Identification
Doubly Robust Estimation
Treatment Is Easy to Model
Outcome Is Easy to Model
Generalized Propensity Score for Continuous Treatment
Key Ideas
III. Effect Heterogeneity and Personalization
6. Effect Heterogeneity
From ATE to CATE
Why Prediction Is Not the Answer
CATE with Regression
Evaluating CATE Predictions
Effect by Model Quantile
Cumulative Effect
Cumulative Gain
Target Transformation
When Prediction Models Are Good for Effect Ordering
Marginal Decreasing Returns
Binary Outcomes
CATE for Decision Making
Key Ideas
7. Metalearners
Metalearners for Discrete Treatments
T-Learner
X-Learner
Metalearners for Continuous Treatments
S-Learner
Double/Debiased Machine Learning
Double-ML for CATE estimation
Visual intuition for Double-ML
Key Ideas
IV. Panel Data
8. Difference-in-Differences
Panel Data
Canonical Difference-in-Differences
Diff-in-Diff with Outcome Growth
Diff-in-Diff with OLS
Diff-in-Diff with Fixed Effects
Multiple Time Periods
Inference
Identification Assumptions
Parallel Trends
No Anticipation Assumption and SUTVA
Strict Exogeneity
No Time Varying Confounders
No Feedback
No Carryover and No Lagged Dependent Variable
Effect Dynamics over Time
Diff-in-Diff with Covariates
Doubly Robust Diff-in-Diff
Propensity Score Model
Delta Outcome Model
All Together Now
Staggered Adoption
Heterogeneous Effect over Time
Covariates
Key Ideas
9. Synthetic Control
Online Marketing Dataset
Matrix Representation
Synthetic Control as Horizontal Regression
Canonical Synthetic Control
Synthetic Control with Covariants
Debiasing Synthetic Control
Inference
Synthetic Difference-in-Differences
DID Refresher
Synthetic Controls Revisited
Estimating Time Weights
Synthetic Control and DID
Key Ideas
V. Alternative Experimental Designs
10. Geo and Switchback Experiments
Geo-Experiments
Synthetic Control Design
Trying a Random Set of Treated Units
Random Search
Switchback Experiment
Potential Outcomes of Sequences
Estimating the Order of Carryover Effect
Design-Based Estimation
Optimal Switchback Design
Robust Variance
Key Ideas
11. Noncompliance and Instruments
Noncompliance
Extending Potential Outcomes
Instrument Identification Assumptions
First Stage
Reduced Form
Two-Stage Least Squares
Standard Error
Additional Controls and Instruments
2SLS by Hand
Matrix Implementation
Discontinuity Design
Discontinuity Design Assumptions
Intention to Treat Effect
The IV Estimate
Bunching
Key Ideas
12. Next Steps
Causal Discovery
Sequential Decision Making
Causal Reinforcement Learning
Causal Forecasting
Domain Adaptation
Closing Thoughts
Index

Polecaj historie