Causal Inference for Data Science 9781633439658

When you know the cause of an event, you can affect its outcome. This accessible introduction to causal inference shows

198 127 10MB

English Pages 392 Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Causal Inference for Data Science
 9781633439658

Table of contents :
Causal Inference for Data Science
copyright
dedication
contents
preface
acknowledgments
about this book
Prerequisites
How this book is organized: A road map
The learning path and philosophy of this book
Different learning styles
Developing intuition and formal methodology
Building your intuition
Practicing the methodology
About the code
liveBook discussion forum
about the author
about the cover illustration
Part 1 Inference and the role of Confounders
1 Introducing causality
1.1 How causal inference works
1.1.1 Step 1: Determine the type of data
1.1.2 Step 2: Understand your problem
1.1.3 Step 3: Create a model
1.1.4 Step 4: Share your model
1.1.5 Step 5: Apply causal inference techniques
1.2 Contrasts between causal models and the predictive models of machine learning
1.3 Experimental studies
1.3.1 Motivating example: Deploying a new website
1.3.2 A/B testing
1.3.3 Randomized controlled trials
1.3.4 Steps to perform an A/B test
1.3.5 Limitations of A/B testing and RCTs
1.4 Observational studies
1.4.1 Simulating synthetic data
1.4.2 Causal effects under confounding
1.5 Reviewing basic statistical concepts
1.5.1 Empirical distributions and data-generating distributions
1.5.2 A refresher on conditional probabilities and expectations
1.6 Further reading
1.7 Chapter quiz
Summary
2 First steps: Working with confounders
2.1 Learning the basic elements of causal inference through Simpson’s paradox
2.1.1 What’s the problem?
2.1.2 Develop your intuition: How to approach the problem
2.1.3 Solving Simpson’s paradox
2.2 Generalizing to other problems
2.2.1 Describing the problem with a graph
2.2.2 Articulating what we would like to know
2.2.3 Finding the way to calculate the causal effect
2.2.4 Articulating what we would like to know: The language of interventions
2.2.5 Finding the way to calculate the causal effect: The adjustment formula
2.2.6 How does the treatment work in each situation? The positivity assumption
2.3 Interventions and RCTs
2.4 First contact with the structural approach
2.4.1 Simulating the kidney stone example
2.4.2 Interventions in the structural approach
2.5 When to apply the adjustment formula
2.5.1 RCT or A/B test
2.5.2 Confounders
2.5.3 Unobserved confounders
2.5.4 Mediators
2.5.5 Many confounders
2.5.6 Outcome predictive variables
2.5.7 Treatment predictive variables
2.5.8 Conditional Intervention
2.5.9 Combining all the previous situations
2.5.10 Summarizing the differences between intervening and applying the adjustment formula
2.6 So, what’s the plan?
2.7 Lessons learned
2.8 Chapter quiz
Summary
3 Applying causal inference
3.1 When and why to use graphs in causal inference analysis
3.2 Steps to formulate your problem using graphs
3.2.1 List all the variables
3.2.2 Create your graph
3.2.3 State your assumptions
3.2.4 State your objectives
3.2.5 Check the positivity assumption
3.3 Other examples
3.3.1 Recommender systems
3.3.2 Pricing
3.3.3 Simulations
3.4 Further reading
3.5 Chapter quiz
Summary
4 How machine learning and causal inference can help each other
4.1 What does supervised learning do?
4.1.1 When should causal inference be used vs. supervised learning?
4.1.2 The goal of data fitting
4.1.3 When the future and the past behave the same way
4.1.4 When do causal inference and supervised learning coincide?
4.1.5 Predictive error is a false friend
4.1.6 Validation of interventions
4.2 How does supervised learning participate in causal inference?
4.2.1 Empirical and generating distributions in the adjustment formula
4.2.2 The flexibility of the adjustment formula
4.2.3 The adjustment formula for continuous distributions
4.2.4 Algorithms for calculating the adjustment formula
4.2.5 Cross–fitting: Avoiding overfitting
4.3 Other applications of causal inference in machine learning
4.3.1 Reinforcement learning
4.3.2 Fairness
4.3.3 Spurious correlations
4.3.4 Natural language processing
4.3.5 Explainability
4.4 Further reading
4.5 Chapter quiz
Summary
Part 2 The adjustment formula in practice
5 Finding comparable cases with propensity scores
5.1 Developing your intuition about the propensity scores
5.1.1 Finding matches for estimating causal effects
5.1.2 But is there a match?
5.1.3 Why matching can be hard
5.1.4 How propensity scores can be used to calculate the ATE
5.2 Basic notions of propensity scores
5.2.1 Which cases are we working with?
5.2.2 What are the propensity scores?
5.2.3 The positivity assumption is … an assumption
5.3 Propensity scores in practice
5.3.1 Data preparation
5.3.2 Calculating the propensity scores
5.3.3 Assess the positivity assumption
5.3.4 Calculating ATEs drawn from the propensity scores
5.4 Calculating propensity score adjustment: An exercise
5.4.1 Exercise steps
5.5 Further reading
5.6 Chapter quiz
Summary
6 Direct and indirect effects with linear models
6.1 Estimating causal effects with linear models
6.1.1 Simulating a pricing problem: A walkthrough
6.1.2 Direct and indirect effects
6.2 Understanding causal dynamics through linear models
6.2.1 The analogy of a gas flowing through pipes
6.2.2 How correlation flows through a graph
6.2.3 Calculating causation and correlation from the arrows' coefficients
6.2.4 Linear models and the “do” operator
6.3 Chapter quiz
Summary
7 Dealing with complex graphs
7.1 Altering the correlation between two variables conditioning on a third one
7.1.1 Arrival time example of conditional independence
7.1.2 Mathematical example of conditional independence
7.1.3 Breaking a causal model into independent modules
7.1.4 The bricks of DAGs: Factorizing probability distributions
7.1.5 What’s the d-separation about?
7.1.6 Defining d-separation
7.2 Back-door criterion
7.2.1 The importance of the back-door criterion
7.3 Good and bad controls
7.3.1 Good controls
7.3.2 Neutral controls
7.3.3 Bad controls
7.4 Revisiting previous chapters
7.4.1 Efficient controls
7.4.2 Propensity score
7.4.3 Again: Don’t include variables in your model just because they make the model more accurate
7.4.4 Should you adjust for income?
7.4.5 Linear models
7.5 An advanced tool for identifying causal effects: The do-calculus
7.6 Further reading
7.7 Chapter quiz
Summary
8 Advanced tools with the DoubleML library
8.1 Double machine learning
8.1.1 FWL theorem: The predecessor of DML
8.1.2 Nonlinear models with DML
8.1.3 DML in practice
8.1.4 Heterogeneous treatment effects
8.2 Confidence intervals
8.2.1 Simulating new datasets with bootstrapping
8.2.2 Analytical formulas for confidence intervals
8.3 Doubly robust estimators
8.3.1 AIPW in practice
8.4 Further reading
8.5 Chapter quiz
Summary
Part 3 Other strategies beyond the adjustment formula
9 Instrumental variables
9.1 Understanding IVs through an example
9.1.1 The example’s DAG
9.1.2 IV assumptions
9.1.3 IVs in RCTs
9.2 Estimating the causal effect with IVs
9.2.1 Applying IVs with linear models
9.2.2 Applying IVs for partially linear models
9.2.3 An alternative formula for the IV method
9.2.4 The lack of a general formula for the general IV graph
9.3 Instrumental variables in practice
9.3.1 Two-stage least squares (2SLS) algorithm
9.3.2 Weak instruments
9.3.3 IVs with DoubleML
9.4 References
9.5 Chapter quiz
Summary
10 Potential outcomes framework
10.1 What is a potential outcome?
10.1.1 Individual outcomes
10.1.2 Population outcomes
10.1.3 Causal effects
10.1.4 PO assumptions
10.2 How do POs relate to DAGs?
10.2.1 The first law of causal inference
10.2.2 Expressing PO assumptions with DAGs
10.2.3 Counterfactuals
10.3 Adjustment formula with potential outcomes
10.4 IVs with potential outcomes
10.5 Chapter quiz
Summary
11 The effect of a time-related event
11.1 Which types of data will we use?
11.2 Regression discontinuity design
11.2.1 Data simulation
11.2.2 RDD terminology
11.2.3 Assumptions
11.2.4 Effect estimation
11.2.5 RDD in practice
11.3 Synthetic controls
11.3.1 Data simulation
11.3.2 Synthetic controls terminology
11.3.3 Assumptions
11.3.4 Effect estimation
11.3.5 Synthetic controls in practice
11.3.6 Selecting training and predicting time periods
11.4 Differences in differences
11.4.1 Data simulation
11.4.2 DiD terminology
11.4.3 Assumptions
11.4.4 Effect estimation
11.4.5 In practice
11.5 Chapter quiz
11.6 Method comparison
11.7 References
Summary
appendix A The math behind the adjustment formula
appendix B Solutions to exercises in chapter 2
B.1 Solution to Simpson’s paradox for treatment B
B.2 Observe and do are different things
B.2.1 Solution
B.3 What do we need to adjust?
B.3.1 RCT
B.3.2 Confounder
B.3.3 Unobserved Confounder
B.3.4 Mediators
B.3.5 Outcome predictive variables
appendix C Technical lemma for the propensity scores
appendix D Proof for doubly robust estimator
D.1 DR property with respect to the T-learner
D.2 Doubly robust property with respect to inverse probability weighting
appendix E Technical lemma for the alternative instrumental variable estimator
appendix F Proof of the instrumental variable formula for imperfect compliance
index

Polecaj historie