Using Data to Make Good Things Happen 9781119646044, 9781119646051, 2020939801, 1119646030

Inform your own analyses by seeing how one of the best data analysts in the world approaches analytics problems Analyti

463 89 32MB

English Pages 528 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Using Data to Make Good Things Happen
 9781119646044, 9781119646051, 2020939801, 1119646030

Table of contents :
Table of Contents
Cover
About the Author
About the Technical Editor
Acknowledgments
Introduction
What Happened?
What Will Happen?
Why Did It Happen?
How Do I Make Good Things Happen?
How to Read This Book
Reader Support for This Book
Part I: What Happened?
CHAPTER 1: Preliminaries
Basic Concepts in Data Analysis
What Is a Random Variable?
Excel Calculations
CHAPTER 2: Was the 1969 Draft Lottery Fair?
The Data
The Analysis
Excel Calculations
CHAPTER 3: Who Won the 2000 Election: Bush or Gore?
Projecting the Undervotes
What Happened with the Overvotes?
The Butterfly Did It!
Excel Calculations
CHAPTER 4: Was Liverpool Over Barcelona the Greatest Upset in Sports History?
How Should We Rank Upsets?
Leicester Wins the 2015–2016 Premier League
#16 Seed UMBC Beats #1 Seed Virginia
The Jets Win Super Bowl III
Other Big Upsets
CHAPTER 5: How Did Bernie Madoff Keep His Fund Going?
The Mathematics of Ponzi Schemes
Madoff's Purported Strategy
The Sharpe Ratio Proves Madoff Was a Fraud
Benford's Law and Madoff's Fraud
Excel Calculations
CHAPTER 6: Is the Lot of the American Worker Improving?
Is U.S. Family Income Skewed?
Median Income and Politics
Causes of Increasing U.S. Income Inequality
Money Isn't Everything: The Human Development Index
Create Your Own Ranking of Well-Being
Are Other Countries Catching Up to the U.S.?
Excel Calculations
CHAPTER 7: Measuring Income Inequality with the Gini, Palm, and Atkinson Indices
The Gini Index
The Palma Index
The Atkinson Index
Excel Calculations
CHAPTER 8: Modeling Relationships Between Two Variables
Examples of Relationships Between Two Variables
Finding the Best-Fitting (Least Squares) Line
Computing the Beta of a Stock
What Is a Good R2?
Correlation and R2
We Are Not Living in a Linear World
Excel Calculations
CHAPTER 9: Intergenerational Mobility
Absolute Intergenerational Mobility
Intergenerational Elasticity
Rank-Rank Mobility
Comparing IGE and Rank-Rank Mobility
Measuring Mobility with Quintiles
The Great Gatsby Curve
Excel Calculations
CHAPTER 10: Is Anderson Elementary School a Bad School?
How Can We Adjust for Family Income?
Estimating the Least Squares Line
Can We Compare Standardized Test Performance for Students in Different States?
Excel Calculations
CHAPTER 11: Value-Added Assessments of Teacher Effectiveness
Simple Gain Score Assessment
Covariate Adjustment Assessment
Layered Assessment Model
Cross-Classified Constant Growth Assessment
Problems with VAA
How Much Is a Good Teacher Worth?
Excel Calculations
CHAPTER 12: Berkeley, Buses, Cars, and Planes
Simpson's Paradox and College Admissions
The Waiting Time Paradox
When Is the Average of 40 and 80 Not 60?
Why Pre COVID Were There Never Empty Seats on My Flight?
Excel Calculations
CHAPTER 13: Is Carmelo Anthony a Hall of Famer?
What Metric Defines Basketball Ability?
Wins Above Replacement Player (WARP)
Manu, Melo, Dirk, and Dwayne
How Do 25,000 Points Lead to So Few Wins?
CHAPTER 14: Was Derek Jeter a Great Fielder?
Fielding Statistics: The First Hundred Years
Range Factor
The Fielding Bible: A Great Leap Forward
The Next Frontier
CHAPTER 15: “Drive for Show and Putt for Dough?”
Strokes Gained
The Myth Exposed
CHAPTER 16: What's Wrong with the NFL QB Rating?
NFL Quarterback Rating
ESPN's Total Quarterback Rating
Excel Calculations
CHAPTER 17: Some Sports Have All the Luck
Skill vs. Luck: The Key Idea
The Results
CHAPTER 18: Gerrymandering
A Stylized Example
The Mathematics of Gerrymandering
CHAPTER 19: Evidence-Based Medicine
James Lind and Scurvy: The Birth of Evidence-Based Medicine
The Randomized Streptomycin Tuberculosis Trial
Excel Calculations
Hormone Replacement: Good or Bad?
CHAPTER 20: How Do We Compare Hospitals?
Ratings Criteria
Conclusion
Excel Calculations
CHAPTER 21: What Is the Worst Health Care Problem in My Country?
Disability-Adjusted Life Years
Determination of Disability Weights
To Age Weight or Discount, That Is the Question
Key Facts About World Health
Part II: What Will Happen?
CHAPTER 22: Does a Mutual Fund's Past Performance Predict Future Performance?
Mutual Fund Basics
Morningstar Ratings
Risk-Adjusting Fund Returns
How Well Do Morningstar Star Ratings Predict a Fund's Future Performance?
The Effect of Expense Ratio on Long-Term Performance
Excel Calculations
CHAPTER 23: Is Vegas Good at Picking NFL Games?
How NFL Betting Works
Bias and Accuracy
Vegas Forecasts Are Unbiased
Totals Predictions and Money Line Predictions Are Unbiased
NFL Accuracy: The Line vs. the Computers
A System Works Until It Doesn't
CHAPTER 24: Will My New Hires Be Good Employees?
What Data Do We Need to Determine Attributes That Best Predict Employee Performance?
Besides GMA, Not Much Affects Job Performance
Excel Calculations
CHAPTER 25: Should I Go to State U or Princeton?
Analyzing Princeton vs. Penn State
Excel Calculations
CHAPTER 26: Will My Favorite Sports Team Be Great Next Year?
Francis Galton and Regression to the Mean
Regression to the Mean in the NFL and the NBA
Excel Calculations
CHAPTER 27: How Did Central Bankers Fail to Predict the 2008 Recession?
The Inverted Yield Curve
The Sahm Rule: Early Warning Signal for Recession
Control Charts and the Housing Price/Rent Ratio
Excel Calculations
CHAPTER 28: How Does Target Know If You're Pregnant?
What Available Data Can Be Used to Identify Pregnant Women?
Problems Arise
An Example of a Pregnancy Prediction Score
CHAPTER 29: How Does Netflix Recommend Movies and TV Shows?
User-Based Collaborative Filtering
Item-Based Filtering
CHAPTER 30: Can We Predict Heart Attacks in Real Time?
Posterior Probability
Sensitivity and Specificity
ROC Curve
Back to the Apple Heart Study
AliveCor and KardiaBand
CHAPTER 31: Is Proactive Policing Effective?
Hot Spots Policing
Predictive Policing
CCTV
Stop and Frisk
Broken Windows
Excel Calculations
CHAPTER 32: Guess How Many Are Coming to Dinner?
Which Parameters Must Be Estimated?
The Data
The Results
Which Factor Really Matters?
Excel Calculations
CHAPTER 33: Can Prediction Markets Predict the Future?
Examples of Trade Contracts
Prediction Market Trading Mechanisms
Accuracy of Prediction Markets and Wisdom of Crowds
CHAPTER 34: The ABCs of Polling
Why Are 1,112 People Enough to Represent U.S. Voters?
Why Doesn't a Larger Population Require a Larger Sample Size?
So, What Can Go Wrong?
Rating Polls
CHAPTER 35: How Did Buzzfeed Make the Dress Go Viral?
Measuring Instagram Engagement
Tweets Do Not Always Go Viral Immediately
Do the First Few Days Predict the Future of a Meme?
CHAPTER 36: Predicting Game of Thrones TV Ratings
What Does Google Trends Tell Us?
Predicting the Present with Google Trends
Using Google Trends to Forecast GOT Ratings
Excel Calculations
Part III: Why Did It Happened?
CHAPTER 37: Does Smoking Cause Lung Cancer?
Correlation and Causation Redux
The Key Evidence
Could Air Pollution Have Caused Lung Cancer?
The Cigarette Companies Hit Back
Excel Calculations
CHAPTER 38: Why Are the Houston Rockets a Good Basketball Team?
NBA Shooting Math 101
Zach LaVine Battles the Bulls' Analytics Department
Conclusion
Excel Calculations
CHAPTER 39: Why Have Sacrifice Bunts and Intentional Walks Nearly Disappeared?
The Case Against Bunting
Bunting Against the Shift
Why Are Intentional Walks on the Decline?
CHAPTER 40: Do NFL Teams Pass Too Much and Go for It Often Enough on Fourth Down?
The Ascent of Passing
Fourth Down Strategy
New Data Partially Vindicates the Coaches
Teams Should Go for Two More Often
CHAPTER 41: What Caused the 1854 London Cholera Outbreak?
Cholera
Snow and the Broad Street Pump
Snow's Randomized Controlled Trial
Conclusion
Excel Calculations
CHAPTER 42: What Affects the Sales of a Retail Product?
Painter's Tape
Estimating the Model Parameters
Excel Calculations
CHAPTER 43: Why Does the Pareto Principle Explain So Many Things?
Power Laws
Why Do Incomes Follow the Pareto Principle?
Why Do a Few Websites Get Most of the Hits?
Excel Calculations
CHAPTER 44: Does Where You Grow Up Matter?
Quasi-Experimental Design vs. Randomized Controlled Trials
What Drives Neighborhood Differences in Upward Mobility?
How Can We Make Things Better?
CHAPTER 45: The Waiting is the Hardest Part
Which Factors Influence the Performance of a Queueing System?
Operating Characteristics of a Queueing System
How Does Variability Degrade the Performance of a Queueing System?
Calculating the Operating Characteristics of a Queueing System
Excel Calculations
CHAPTER 46: Are Roundabouts a Good Idea?
What Is a Roundabout?
History of Roundabouts
Benefits of Roundabouts
Disadvantages of Roundabouts
Roundabout Capacity
Roundabouts and Revolutions
CHAPTER 47: Red Light, Green Light, or No Light?
What Causes Traffic Jams?
How Should We Set the Lights?
Ramp Meters and Equity
Measuring the Impact of Ramp Meters
The Twin Cities Metering Holiday
Part IV: How Do I Make Good Things Happen?
CHAPTER 48: How Can We Improve K–12 Education?
Tennessee's STAR Study on K–2 Class Size
Cost–Benefit Analysis
Can Predictive Analytics Increase Enrollment and Performance in Eighth-Grade Algebra I?
Excel Calculations
CHAPTER 49: Can A/B Testing Improve My Website's Performance?
Improving Obama's Fundraising in 2008
The Mechanics of Resampling
Excel Calculations
CHAPTER 50: How Should I Allocate My Retirement Portfolio?
The Basic Portfolio Optimization Model
The Efficient Frontier
Difficulties in Implementing the Markowitz Model
Excel Calculations
CHAPTER 51: How Do Hedge Funds Work?
Growth in Hedge Funds and Hedge Fund Fee Structure
Shorting a Stock
Long/Short and Market-Neutral Strategies
Convertible Arbitrage
Merger Arbitrage
Global Macro Strategy
Hedge Fund Performance
The George Costanza Portfolio
Excel Calculations
CHAPTER 52: How Much Should We Order and When Should We Order?
The Economic Order Quantity Model
Reorder Points, Service Levels, and Safety Stock
Excel Calculations
CHAPTER 53: How Does the UPS Driver Know the Order to Deliver Packages?
Why Is the Traveling Salesperson Problem So Hard?
Solving the Traveling Salesperson Problem
The Traveling Salesperson Problem in the Real World
Excel Calculations
CHAPTER 54: Can Data Win a Presidential Election?
Democratic Presidential Analytics
The GOP Strikes Back
Cambridge Analytica and the 2016 Election
Excel Calculations
CHAPTER 55: Can Analytics Save Our Republic?
Arrow's Impossibility Theorem
It's Not Easy to Pick a Winner!
Ranked-Choice Voting
Approval Voting
Quadratic Voting
Excel Calculations
CHAPTER 56: Why Do I Pay Too Much on eBay?
How Many Pennies in the Jar?
The Importance of Asymmetric Information
The Winner's Curse and Offshore Oil Leases
Sports Free Agents and the Winner's Curse
Can You Avoid the Winner's Curse?
Excel Calculations
CHAPTER 57: Can Analytics Recognize, Predict, or Write a Hit Song?
How Does Shazam Know What Song You Are Listening To?
How Did Hit Song Science Know Norah Jones's Album Would Be a Smash?
Can Artificial Intelligence Write a Good Song?
CHAPTER 58: Can an Algorithm Improve Parole Decisions?
An Example of Risk Scores
ProPublica Criticizes Risk Scores
Skeem and Lowenkamp and PCRA
Machine Learning and Parole Decisions
CHAPTER 59: How Do Baseball Teams Decide Where to Shift Fielders?
The Debut of the Shift
The Return of the Shift
Empirical Evidence on the Shift
Why Not Just Beat the Shift?
Excel Calculations
CHAPTER 60: Did Analytics Help the Mavericks Win the 2011 NBA Title?
How Can You Evaluate a Basketball Player?
From Player Ratings to Lineup Ratings
CHAPTER 61: Who Gets the House in the Hamptons?
The Basic Idea
What Asset Division Is Best?
Excel Calculations
Index
End User License Agreement
List of Tables
Chapter 6
Table 6.1: Real household median income 1977–1980
Table 6.2: Median income 2007–2012
Chapter 8
Table 8.1: Examples of relationships between two variables
Table 8.2: Galton height correlations
Table 8.3: Product elasticities
Chapter 11
Table 11.1: Teacher adjusted gains
Table 11.2: Final teacher ratings
Chapter 13
Table 13.1: WARP based on minutes and RPM
Table 13.2: WARP during peak years
Table 13.3: Shooting stats for 2018–2019 NBA, Manu and Melo
Chapter 14
Table 14.1: Jeter and league average fielding percentages
Table 14.2: Jeter's range factors
Table 14.3: Jeter's runs above or below average
Chapter 15
Table 15.1: Example of strokes gained
Chapter 16
Table 16.1: Examples of points gained
Chapter 17
Table 17.1: Skill vs. luck
Chapter 18
Table 18.1: Fraction of incumbent House members winning reelection
Chapter 20
Table 20.1: Medicare weightings for star ratings
Table 20.2: Distribution of Medicare.gov star ratings
Chapter 21
Table 21.1: 2010 Disability Weights
Table 21.2: Percentage of 2010 DALYs by Major Cause
Chapter 22
Table 22-1: Ten-year average Morningstar ratings
Chapter 24
Table 24.1: Correlation between selection procedures and job performance
Chapter 27
Table 27.1: Inverted yield curve signaling a recession
Table 27.2: Comparison of Sahm predictions to actual start date of recessions
Table 27.3: Tests for out-of-control process
Chapter 28
Table 28.1: Examples of logistic regression
Chapter 30
Table 30.1: 2 × 2 Contingency table for hypothetical disease test
Table 30.2: 2 × 2 Contingency table for Apple Heart Study
Table 30.3: Examples of sensitivity and specificity
Chapter 31
Table 31.1: Disparate impact of SQF on blacks based on different population defin...
Chapter 32
Table 32.1: Parameters needed to predict Super Bowl Sunday customer count
Table 32.2: Factors that most affect our forecasts
Chapter 33
Table 33.1: Examples of prediction market contracts
Chapter 40
Table 40.1: Expected points added on fourth and 1
Chapter 44
Table 44.1: Effect on Anna's income based on age she moves
Table 44.2: Results of housing voucher study
Chapter 46
Table 46.1: Improvement in Traffic Congestion Due to Roundabouts
Table 46.2: Arrivals to a Roundabout
Chapter 49
Table 49.1: Examples of A/B tests
Chapter 51
Table 51.1: Stock market scenarios for long/short strategy
Table 51.2: Returns from long/short strategy
Table 51.3: Scenario returns on convertible bonds and stocks
Table 51.4: George's maxmin portfolio
Chapter 52
Table 52.1: Examples of service level and reorder point
Chapter 53
Table 53.1: All tours for 4-city TSP
Table 53.2: Two tours breed a “child”
Chapter 55
Table 55.1: Five candidates, six results
Table 55.2: Ranked voting example
Table 55.3: No Condorcet winner
Chapter 57
Table 57.1: Probability Mass Function for Next Chord After an F
Chapter 58
Table 58.1: Average prediction for chance of crime vs. actual crime percentage
Table 58.2: Logistic regression confusion matrix
Table 58.3: Random forest confusion matrix
Chapter 60
Table 60.1: Top 5 NBA Players 2000–2009
Table 60.2: Top 5 NBA Players 2010–2019
List of Illustrations
Chapter 1
Figure 1.1: U.S. state populations
Figure 1.2: Heights of 200 adult U.S. women
Figure 1.3: Histogram of state populations
Figure 1.4: Histogram of women's heights
Figure 1.5: Histogram of annual investment returns
Figure 1.6: PDF for height of American woman
Figure 1.7: Chance of winning the Super Bowl
Figure 1.8: Statistical chart icon
Figure 1.9: Histogram chart icon
Figure 1.10: Settings for histogram bin ranges
Figure 1.11: Computing descriptive statistics
Chapter 2
Figure 2.1: Average draft lottery number by month
Figure 2.2: Results of two-sample Z-test
Figure 2.3: Computing average lottery number by month
Figure 2.4: Settings for two-sample t-test
Chapter 3
Figure 3.1: The butterfly ballot
Figure 3.2: Predicting Buchanan vote from Perot vote using all counties
Figure 3.3: Predicting Buchanan vote from Perot vote omitting Palm Beach Cou...
Chapter 4
Figure 4.1: 2015–2016 Premier League Odds
Chapter 5
Figure 5.1: Ponzi scheme model
Figure 5.2: Benford's law for country populations
Figure 5.3: Benford's law and Madoff returns
Chapter 6
Figure 6.1: Trends in U.S. income inequality
Figure 6.2: HDI indices 1980–2017
Figure 6.3: Attribute ranks for ranking countries
Figure 6-4: Ratio of national mean incomes to U.S.
Chapter 7
Figure 7.1: Data for computing the Gini index
Figure 7.2: Gini index chart when income is proportional to n
Figure 7.3: Gini index chart with complete inequality
Figure 7.4: 2020 Gini indices
Figure 7.5 : The Palma index 2014–2015
Figure 7.6: Computation of the Atkinson index
Chapter 8
Figure 8.1: Least squares line for demand
Figure 8.2: Betas July 2019
Figure 8.3: Beta Madoff returns
Figure 8.4: Galton height data
Figure 8.5: Estimating a power curve
Figure 8.6: Using array formulas to compute Galton correlations
Chapter 9
Figure 9.1: Median income for parent and child, birth year 1940–1980
Figure 9.2: IGE and rank-rank mobility for an unequal immobile society
Figure 9.3: IGE and rank-rank mobility for an equalized immobile society
Figure 9.4: IGE and rank-rank mobility for an unequal mobile society
Figure 9.5: U.S. intergenerational mobility quintile chart
Figure 9.6: The Great Gatsby Curve
Figure 9.7: Mankiw's rebuttal to the Gatsby Curve
Chapter 11
Figure 11.1: Example of simple gain score model
Figure 11.2: Cross-classified growth model
Chapter 12
Figure 12.1: Overall admission rate by gender
Figure 12.2: Overall admission rate by major and gender
Figure 12.3: Overall fraction by gender, applying to each major
Chapter 16
Figure 16.1: 2018 NFL QB statistics
Figure 16.2: NFL QB rating regression output
Figure 16.3: Expected points per passing attempt
Figure 16.4: 2018 ESPN QBR ratings
Chapter 17
Figure 17.1: A lot of luck
Figure 17.2: Not much luck
Chapter 18
Figure 18.1: North Carolina's 12th district
Figure 18.2: Fifteen voter distribution in Gerryberry
Figure 18.3: Democrats win six seats in Gerryberry
Figure 18.4: Actual North Carolina congressional districts
Figure 18.5: Math alternative to North Carolina congressional district bound...
Figure 18.6: Generating random maps
Figure 18.7: Three measures of compactness
Figure 18.8: Population equality
Chapter 19
Figure 19.1: Shuffling the 19 deaths among the streptomycin and bed rest gro...
Figure 19.2: Counting times

Polecaj historie