Advanced Statistics For The Behavioral Sciences A Computational Approach With R 331993547X, 9783319935478

This book demonstrates the importance of computer-generated statistical analyses in behavioral science research, particu

1,520 118 18MB

English Pages 539 Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Advanced Statistics For The Behavioral Sciences A Computational Approach With R
 331993547X,  9783319935478

Citation preview

Jonathon D. Brown

Advanced Statistics for the Behavioral Sciences A Computational Approach with R

Advanced Statistics for the Behavioral Sciences

Jonathon D. Brown

Advanced Statistics for the Behavioral Sciences A Computational Approach with R

Jonathon D. Brown Department of Psychology University of Washington Seattle, WA, USA

ISBN 978-3-319-93547-8 ISBN 978-3-319-93549-2 https://doi.org/10.1007/978-3-319-93549-2

(eBook)

Library of Congress Control Number: 2018950841 © Springer Nature Switzerland AG 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

“My thinking is first and last and always for the sake of my doing.” —William James

As insightful as he was, William James was not referring to the twenty-firstcentury relation between computer-generated statistical analyses and scientific research. Nevertheless, his insistence that thinking is always for doing speaks to that association. In bygone days, statisticians were researchers—pursuing their own line of inquiry or employed by companies to identify productive practices—and the statistical analyses they developed were tools to help them understand the phenomena they were studying. Today, statistical analyses are increasingly developed and refined by individuals who have received training in computer science, and their expertise lies in writing efficient and elegant computer code. As a result, ordinary researchers who lack a background in computer programming are asked to accept on faith the black-box output that emerges from the sophisticated statistical models they increasingly use. This book is designed to bridge the gap between computer science and research application. Many of the analyses are advanced (e.g., regularization and the lasso, numerical optimization with the Nelder-Mead simplex, and mixed modeling with penalized least squares), but the presentation is relaxed, with an emphasis on understanding where the numbers come from and how they can be interpreted. In short, the focus is on “thinking for the sake of doing.”

v

vi

Preface

Organization The book is divided into three sections. Linear algebra 1. Linear equations 2. Least squares estimation 3. Linear regression 4. Eigen decomposition 5. Singular value decomposition

Bias and efficiency 6. Generalized least squares 7. Robust regression 8. Model selection and shrinkage estimators 9. Cubic splines and additive models

Nonlinear models 10. Optimization and nonlinear least squares 11. Generalized linear models 12. Survival analysis 13. Time-series analysis 14. Mixed-effects models

I begin with linear algebra for two reasons. First, and most obviously, linear algebra underlies most statistical analyses; second, understanding the mathematical operations involved in Gaussian elimination and backward substitution provides a basis for understanding how modern statistical software packages approach statistical analyses (e.g., why the QR decomposition is used to solve linear regression problems). An emphasis on numerical analysis, which occurs throughout the text, represents one of the book’s most distinctive features.

Using ℛ All of the analyses in this book were performed using ℛ, a free software programming language and software environment for statistical computing and graphics that can be downloaded at http://www.r-project.org. However, instead of relying on canned functions or user-created packages that must be downloaded and installed, I have provided my own code so that readers can see for themselves how the analyses are performed. Moreover, each analysis uses a small (n ¼ 12) data set to encourage readers to track the operations “in real time,” with each data set telling a coherent story with interpretable results. The codes I have included are not intended to supplant packaged ones in ℛ. Instead, they are offered as a pedagogical tool, designed to demystify the operations that underlie each analysis. Toward that end, they are written with an eye toward simplicity, occupying no more than one manuscript page of text. Few of them contain checks for anomalous cases, so they should be used only for the particular analyses for which they are intended. At the end of each section, the relevant functions available in ℛ are identified, ensuring that readers can see how each analysis is performed and have access to the state-of-the-art code that is properly used for each statistical model. Most of the codes are contained within each chapter, allowing readers to copy and paste them into ℛ while they are working through the problems in the book. Occasionally a code is called from a previous chapter, in which case I have specified

Preface

vii

a folder location: 'C:\\ASBS\\code.R' (Advanced Statistics for the Behavioral Sciences) as a placeholder. I have not, however, created an ℛ package for the code as they are meant to be used only for the problems within the book.

Intended Audience This book is intended for graduate students in the behavioral sciences who have taken an introductory graduate level course. It consists of 14 chapters, making it suitable for a 15-week semester or a 10-week quarter. This book should also be of interest to intellectually curious researchers who have been using a particular statistical method in their research (e.g., mixed-effects models) without fully understanding the mathematics behind the approach. My hope is that researchers will more readily embrace advanced statistical analyses once the underlying operations have been illuminated. Seattle, WA, USA

Jonathon D. Brown

Contents

Part I 1

Linear Algebra

Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Row Reduction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 R Code: Gaussian Elimination and Backward Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Gauss-Jordan Elimination . . . . . . . . . . . . . . . . . . . . . 1.1.5 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.6 R Code: LU Decomposition . . . . . . . . . . . . . . . . . . . 1.1.7 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . 1.1.8 R Code: Cholesky Decomposition of a Symmetric Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Matrix Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 R Code: Determinant . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Determinants and Linear Dependencies . . . . . . . . . . . 1.2.4 R Code: Reduced Row Echelon Form and Linear Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Using the Determinant to Solve Linear Equations . . . . 1.2.6 R Code: Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . 1.2.7 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.8 R Code: Calculate Inverse Using Reduced Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.9 Norms, Errors, and the Condition Number of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.10 R Code: Condition Number and Norm Ratio . . . . . . .

3 4 5 7 8 9 10 14 15 18 19 19 21 21 22 23 24 24 26 26 33

ix

x

Contents

1.3

Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Jacobi’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Gauss-Seidel Method . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 R Code: Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . 1.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

. . . . . . .

34 34 35 36 37 38 38

Least Squares Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Line of Best Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Deriving a Line of Best Fit . . . . . . . . . . . . . . . . . . . . 2.1.2 Minimizing the Sum of Squared Differences . . . . . . . 2.1.3 Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Analytic Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Solving the Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 The QR Decomposition . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Advantages of an Orthonormal System . . . . . . . . . . . 2.2.3 Hat Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6 R Code: QR Solver . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Performing the QR Decomposition . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . 2.3.2 R Code: QR Decomposition; Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 R Code: QR Decomposition; Givens Rotations . . . . . . 2.3.5 Householder Reflections . . . . . . . . . . . . . . . . . . . . . . 2.3.6 R Code: QR Decomposition; Householder Reflectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Comparing the Decompositions . . . . . . . . . . . . . . . . . 2.3.8 R Code: QR Decomposition Comparison . . . . . . . . . . 2.4 Linear Regression and its Assumptions . . . . . . . . . . . . . . . . . . . 2.4.1 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Nature of the Variables . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Errors and their Distribution . . . . . . . . . . . . . . . . . . . 2.4.4 Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . 2.5 OLS Estimation and the Gauss-Markov Theorem . . . . . . . . . . . 2.5.1 Proving the OLS Estimates are Unbiased . . . . . . . . . . 2.5.2 Proving the OLS Estimates are Efficient . . . . . . . . . . . 2.6 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Log Likelihood Function . . . . . . . . . . . . . . . . . . . . . 2.6.2 R Code: Maximum Likelihood Estimation . . . . . . . . . 2.7 Beyond OLS Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39 39 39 41 42 42 43 43 44 45 47 48 49 49 49 53 54 58 58 61 61 62 62 63 64 65 67 67 68 69 71 71 74 74 75 76

Contents

3

4

Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Inspecting the Residuals . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Describing the Model’s Fit to the Data . . . . . . . . . . . . 3.1.3 Testing the Model’s Fit to the Data . . . . . . . . . . . . . . 3.1.4 Variance Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 Tests of Significance . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 R Code: Confidence Interval Simulation . . . . . . . . . . 3.1.8 Confidence Regions . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.9 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.10 R Code: Simple Linear Regression . . . . . . . . . . . . . . 3.1.11 R Code: Simple Linear Regression: Graphs . . . . . . . . 3.2 Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Variance Estimates, Significance Tests, and Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Model Comparisons and Changes in R2 . . . . . . . . . . . 3.2.5 Comparing Predictors . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.7 R Code: Multiple Regression . . . . . . . . . . . . . . . . . . . 3.3 Polynomials, Cross-Products, and Categorical Predictors . . . . . . 3.3.1 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 R Code: Polynomial Regression . . . . . . . . . . . . . . . . 3.3.3 Cross-Product Terms . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 R Code: Cross-Product Terms and Simple Slopes . . . . 3.3.5 Johnson-Neyman Procedure . . . . . . . . . . . . . . . . . . . 3.3.6 R Code: Johnson-Neyman Procedure . . . . . . . . . . . . . 3.3.7 Categorical Predictors . . . . . . . . . . . . . . . . . . . . . . . . 3.3.8 R Code: Contrast Codes for Categorical Predictors . . . 3.3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eigen Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Eigenvector Multiplication . . . . . . . . . . . . . . . . . . . . 4.1.2 The Characteristic Equation . . . . . . . . . . . . . . . . . . . 4.1.3 R Code: Eigen Decomposition of a 2  2 Matrix with Real Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Properties of a Diagonalized Matrix . . . . . . . . . . . . . . 4.2 Eigenvalue Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Basic QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 R Code: QR Algorithm Using Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

77 77 79 80 80 81 82 83 83 83 84 87 88 89 90 92 94 95 97 98 99 99 100 105 105 109 110 111 111 113 114 114 115 117 117 117 119 121 121 122 122 124

xii

Contents

4.2.3 4.2.4 4.2.5 4.2.6 4.2.7

Improving the QR Algorithm . . . . . . . . . . . . . . . . . . R Code: Hessenberg Form . . . . . . . . . . . . . . . . . . . . R Code: Shifted QR Algorithm . . . . . . . . . . . . . . . . . Francis (Implicitly-Shifted QR) Algorithm . . . . . . . . . R Code: Francis Bulge Chasing Algorithm (Single-Shift) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Eigenvector Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 R Code: Eigenvector Calculation Using LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Matrix Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 R Code: Matrix Power Using Eigen Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Power Method for Dominant Eigen Pair . . . . . . . . . . . 4.4.4 Population Ecology . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 Predator-Prey Model . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.6 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.7 R Code: Power Method and Applications . . . . . . . . . . 4.5 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Compute an Initial Eigenvector . . . . . . . . . . . . . . . . . 4.5.2 Create an Orthonormal Basis . . . . . . . . . . . . . . . . . . . 4.5.3 Rotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Deflate and Continue Iterating . . . . . . . . . . . . . . . . . . 4.5.5 R Code: Schur Decomposition . . . . . . . . . . . . . . . . . 4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . 5.1.3 R Code: Singular Value Decomposition . . . . . . . . . . 5.1.4 Matrix Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.5 Pseudoinverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.6 Solving Linear Equations . . . . . . . . . . . . . . . . . . . . 5.1.7 R Code: Matrix Rank and Pseudoinverse . . . . . . . . . 5.2 Calculating the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 R Code: One-Sided Jacobi Algorithm . . . . . . . . . . . 5.3 Data Reduction and Image Compression . . . . . . . . . . . . . . . . . 5.3.1 R Code: Image Compression Using Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . 5.4 Principal Components Analysis . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

124 126 129 129 133 133 135 135 135 136 136 137 139 141 144 144 145 145 146 146 147 148 148 149 149 150 151 154 155 155 155 156 157 157 157 160 161

. 162 . 163 . 163

Contents

xiii

5.4.2 5.4.3 5.4.4 5.4.5 5.4.6

R Code: Principal Components Analysis . . . . . . . . . . Total Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . R Code: Total Least Squares . . . . . . . . . . . . . . . . . . . Dimension Reduction . . . . . . . . . . . . . . . . . . . . . . . . R Code: Principal Components Analysis of Cereal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.7 Data Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.8 R Code: Data Construction . . . . . . . . . . . . . . . . . . . . 5.5 Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Using the SVD to Detect Collinearity . . . . . . . . . . . . 5.5.2 R Code: Collinearity Detection . . . . . . . . . . . . . . . . . 5.5.3 Principal Components Regression . . . . . . . . . . . . . . . 5.5.4 R Code: Principal Components Regression of (Fictitious) NFL Data . . . . . . . . . . . . . . . . . . . . . . 5.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part II 6

166 167 170 170 176 176 177 177 178 182 182 183 185 186

Bias and Efficiency

Generalized Least Squares Estimation . . . . . . . . . . . . . . . . . . . . . . 6.1 Gauss–Markov Violations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 R Code: Simulations for Fig. 6.1 . . . . . . . . . . . . . . . . 6.2 Generalized Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 R Code: OLS Estimation as GLS Estimation . . . . . . . 6.2.2 OLS and GLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 R Code: Generalized Least Squares Estimation . . . . . . 6.3 Heteroscedasticity and Feasible Weighted Least Squares . . . . . . 6.3.1 Assessing Heteroscedasticity . . . . . . . . . . . . . . . . . . . 6.3.2 R Code: Breusch–Pagan Test of Heteroscedasticity . . . 6.3.3 Feasible Weighted Least Squares . . . . . . . . . . . . . . . . 6.3.4 R Code: Feasible Weighted Least Squares . . . . . . . . . 6.3.5 Heteroscedasticity Consistent Covariance Matrix . . . . 6.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.7 R Code: Heteroscedasticity Consistent Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.8 Confidence Interval Simulation . . . . . . . . . . . . . . . . . 6.3.9 R Code: Heteroscedasticity Confidence Interval Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Autocorrelated Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Mathematical Representation . . . . . . . . . . . . . . . . . . . 6.4.2 Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . Detecting Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 R Code: Detecting Autocorrelations . . . . . . . . . . . . . . 6.4.4 Accommodating Autocorrelated Errors . . . . . . . . . . . 6.4.5 Feasible Generalized Least Squares . . . . . . . . . . . . . . 6.4.6 R Code: Feasible Generalized Least Squares . . . . . . .

189 189 191 192 193 195 196 196 196 198 198 199 201 202 202 203 204 204 205 206 206 208 208 210 211

xiv

Contents

6.4.7 6.4.8 6.4.9

7

Autocorrelation Consistent Covariance Matrix . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R Code: Autocorrelation Consistent Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 212 . 212

Robust Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Assessing Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Tests of Normality . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 R Code: Assessing the Normality of the Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Influence and Normality . . . . . . . . . . . . . . . . . . . . . 7.1.4 Leverage and Influence . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Cook’s D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.6 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.7 Handling Influential Observations . . . . . . . . . . . . . . 7.1.8 R Code: Cook’s D . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Robust Estimators and Influential Observations . . . . . . . . . . . . 7.2.1 Breakdown Point . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 R Code: Robust Regression Simulation . . . . . . . . . . 7.3 Resistant Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Least Absolute Regression . . . . . . . . . . . . . . . . . . . 7.3.2 R Code: Least Absolute Regression . . . . . . . . . . . . . 7.3.3 Least Median of Squares . . . . . . . . . . . . . . . . . . . . . 7.3.4 R Code: Least Median of Squares . . . . . . . . . . . . . . 7.4 M Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Weighting Methods . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 R Code: Robust Regression with M Estimation . . . . 7.5 Bootstrapped Confidence Intervals . . . . . . . . . . . . . . . . . . . . . 7.5.1 Case Resampling vs. Residual Resampling . . . . . . . . 7.5.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 R Code: Bootstrapping with Robust Regression (M Estimation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 MM Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 S Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 R Code: S Estimation (Part 1) . . . . . . . . . . . . . . . . . 7.6.3 R Code: MM Estimation (compact form with sub functions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.5 R Code: Robust Regression of Star Data . . . . . . . . . 7.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 219 . 220 . 221

. 215 . 216 . 217

. . . . . . . . . . . . . . . . . . . . . . .

222 222 223 224 226 227 228 228 229 229 230 231 231 232 233 235 236 236 237 239 239 240 241

. . . .

243 244 244 246

. . . . . .

247 248 250 251 251 252

Contents

8

9

xv

Model Selection and Biased Estimation . . . . . . . . . . . . . . . . . . . . . 8.1 Prediction Error and Model Complexity . . . . . . . . . . . . . . . . . . 8.1.1 Prediction Errors and the Bias-Variance Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Information Criteria Measures and Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.4 R Code: Cross Validation and Information Criteria Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Subset Selection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 R Code: Fictitious Data Predicting College Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 R Code: Stepwise Regression . . . . . . . . . . . . . . . . . . 8.2.4 Best Subset Regression . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 R Code: Sweep Operator . . . . . . . . . . . . . . . . . . . . . . 8.2.6 R Code: Sweep Operator for Best Subset Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.7 Branch and Bound Algorithm . . . . . . . . . . . . . . . . . . 8.2.8 R Code: Branch and Bound (Compact Form) . . . . . . . 8.2.9 Comparing the Models . . . . . . . . . . . . . . . . . . . . . . . 8.2.10 R Code: Model Comparison . . . . . . . . . . . . . . . . . . . 8.3 Shrinkage Estimators and Regularized Regression . . . . . . . . . . . 8.3.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 R Code: Ridge Regression: Augmented Matrix Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 R Code: Ridge Regression . . . . . . . . . . . . . . . . . . . . 8.3.4 Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.5 R Code: LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Comparing the Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cubic Splines and Additive Models . . . . . . . . . . . . . . . . . . . . . . . 9.1 Piecewise Polynomials and Regression Splines . . . . . . . . . . . . 9.1.1 Truncated Power Basis . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Natural Cubic Spline . . . . . . . . . . . . . . . . . . . . . . . 9.1.3 R Code: Truncated Power Series and Natural Cubic Spline Bases . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.4 B Spline Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.5 R Code: B-Spline Basis . . . . . . . . . . . . . . . . . . . . . 9.1.6 Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . 9.2 Penalized Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Reinsch Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

253 253 254 255 257 258 259 259 261 265 266 268 270 271 273 274 274 276 276 279 281 281 284 285 286 287

. . . .

289 289 291 293

. . . . . .

294 294 298 299 299 300

xvi

Contents

9.2.2

R Code: Penalized Smoothing Spline: Reinsch Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 P-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Statistical Inference and Confidence Intervals . . . . . . . 9.2.5 Comparing Penalized Smoothing Splines and Regression Splines . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.6 R Code: P-Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Additive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Fitting an Additive Model . . . . . . . . . . . . . . . . . . . . . 9.3.2 Backfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.3 Partial Slopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.4 Model Selection and Inference . . . . . . . . . . . . . . . . . 9.3.5 R Code: Additive Model: Backfitting . . . . . . . . . . . . . 9.3.6 Penalized Least Squares . . . . . . . . . . . . . . . . . . . . . . 9.3.7 R Code: Additive Model: Penalized Least Squares . . . 9.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part III 10

302 302 305 307 309 309 310 311 312 313 314 315 317 318 319

Nonlinear Models

Nonlinear Regression and Optimization . . . . . . . . . . . . . . . . . . . . . 10.1 Comparing Linear and Nonlinear Models . . . . . . . . . . . . . . . . . 10.1.1 Model Representation . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . 10.1.4 Standard Errors, Parameter Interpretation, and Degrees of Freedom . . . . . . . . . . . . . . . . . . . . . . 10.1.5 Variety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Root Finding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.3 R Code: Root-Finding Algorithms . . . . . . . . . . . . . . . 10.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Exponential Growth Model . . . . . . . . . . . . . . . . . . . . 10.3.2 Newton-Raphson . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.3 R Code: Newton-Raphson . . . . . . . . . . . . . . . . . . . . . 10.3.4 Fisher’s Method of Scoring . . . . . . . . . . . . . . . . . . . . 10.3.5 R Code: Fisher’s Method of Scoring . . . . . . . . . . . . . 10.3.6 Gauss-Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.7 R Code: Gauss-Newton . . . . . . . . . . . . . . . . . . . . . . 10.3.8 BFGS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.9 R Code: BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.10 Nelder-Mead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.11 R Code: Nelder-Mead (Compact Form) . . . . . . . . . . . 10.3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

323 323 323 325 325 325 326 326 328 329 330 331 331 333 335 335 337 338 340 340 343 344 348 349

Contents

xvii

10.4

11

Missing Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Classifying Missing Data . . . . . . . . . . . . . . . . . . . . . 10.4.2 Maximum Likelihood Estimation and the Expectation-Maximization Algorithm . . . . . . . . . . . . 10.4.3 Bivariate Example . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.4 Multivariate Illustration . . . . . . . . . . . . . . . . . . . . . . . 10.4.5 R Code: EM Algorithm for Multivariate Normal with Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.6 Multiple Regression with Missing Observations . . . . . 10.4.7 R Code: EM Regression with Bootstrapped Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

349 349

Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Log Likelihood Functions . . . . . . . . . . . . . . . . . . . . 11.1.2 Components of a Generalized Linear Model . . . . . . . 11.1.3 Iteratively Reweighted Least Squares Estimation . . . 11.1.4 Canonical Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.5 R Code: IRLS Estimation for GLM with Canonical Links . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Deviance and Goodness of Fit . . . . . . . . . . . . . . . . . 11.2.3 Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.4 Regression Coefficients and Fitted Values . . . . . . . . 11.2.5 Standard Errors, Tests of Significance, and Confidence Intervals . . . . . . . . . . . . . . . . . . . . . 11.2.6 R Code: GLM Fit . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.7 R-Code: GLM: Profile Likelihood . . . . . . . . . . . . . . 11.2.8 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.9 Overdispersion and Quasi-Likelihood Estimation . . . 11.2.10 R-Code: GLM Residuals . . . . . . . . . . . . . . . . . . . . . 11.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2 GLM with a Binomial Distribution . . . . . . . . . . . . . 11.3.3 Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.4 Interpreting the Fitted Values and Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.5 Standard Errors and Confidence Intervals . . . . . . . . . 11.3.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.7 R Code: GLM: Binomial Distribution with Logit Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

361 361 362 364 365 367

. . . . . .

368 369 370 371 372 373

. . . . . . . . . .

374 376 377 378 379 381 381 381 382 382

350 351 354 356 357 358 359 359

. 384 . 386 . 386 . 387

xviii

Contents

11.4

Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Properties of a Gamma Distribution . . . . . . . . . . . . . 11.4.2 R Code: Gamma Distribution Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . 11.4.3 Gamma GLM with Canonical Link . . . . . . . . . . . . . 11.4.4 R Code: GLM: Gamma Distribution . . . . . . . . . . . . 11.4.5 Gamma GLM with Non Canonical Links . . . . . . . . . 11.4.6 R Code: GLM: Gamma Distribution . . . . . . . . . . . . 11.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

. 387 . 387 . . . . . . .

391 392 394 395 396 397 398

Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Statistical Functions . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Nonparameteric Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Kaplan-Meier Estimator . . . . . . . . . . . . . . . . . . . . . . 12.2.2 Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.4 Median Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.5 Hazard Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.6 R Code: Kaplan-Meier Estimator (Log-Log Confidence Intervals) . . . . . . . . . . . . . . . . . . . . . . . . 12.2.7 Log-Rank Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.8 R Code: Log Rank Test . . . . . . . . . . . . . . . . . . . . . . 12.2.9 R Code: Log Rank Test cont. . . . . . . . . . . . . . . . . . . 12.3 Semiparametric Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Preliminary Example Without Ties . . . . . . . . . . . . . . 12.3.3 Interpreting the Hazard Ratio . . . . . . . . . . . . . . . . . . 12.3.4 Partial Likelihood Function . . . . . . . . . . . . . . . . . . . . 12.3.5 Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.6 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.7 R Code: Cox Regression–No Ties/Single Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.8 Handling Ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.9 R Code: Cox Regression (Compact Form) . . . . . . . . . 12.3.10 Residuals When Ties are Present . . . . . . . . . . . . . . . . 12.3.11 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.12 R Code: Cox Regression Residuals (Compact Form) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Parametric Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Properties of a Weibull Distribution . . . . . . . . . . . . . . 12.4.2 Assessing the Appropriateness of a Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . .

399 399 399 400 402 403 404 405 405 406 407 408 409 412 413 413 413 414 415 415 416 419 421 421 426 427 428 429 430 430 432

Contents

12.4.3 R Code: Weibull MLE . . . . . . . . . . . . . . . . . . . . . . 12.4.4 Weibull Regression . . . . . . . . . . . . . . . . . . . . . . . . 12.4.5 Interpreting the AFT Coefficients . . . . . . . . . . . . . . 12.4.6 Model Fit and Standard Errors . . . . . . . . . . . . . . . . . 12.4.7 R Code: Weibull Regression Single Predictor . . . . . . 12.4.8 Conversions to Proportional Hazard . . . . . . . . . . . . . 12.4.9 R Code: Weibull AFT and PH Conversion . . . . . . . . 12.4.10 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.11 Diagnostic Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.12 R Code: Weibull AFT Diagnostics . . . . . . . . . . . . . 12.5 For Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

xix

. . . . . . . . . . . . .

433 433 434 437 438 439 440 440 441 445 446 446 447

Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 Dynamics of a Time Series . . . . . . . . . . . . . . . . . . . 13.1.2 Stationarity and Differencing . . . . . . . . . . . . . . . . . . 13.1.3 From ARIMA to ARMA . . . . . . . . . . . . . . . . . . . . . 13.1.4 R Code: Stationarity and Differencing . . . . . . . . . . . 13.2 Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Autocorrelation Function . . . . . . . . . . . . . . . . . . . . . 13.2.2 Partial Autocorrelations . . . . . . . . . . . . . . . . . . . . . . 13.2.3 R Code: Autocorrelation and Partial Autocorrelation Function . . . . . . . . . . . . . . . . . . . . . 13.3 Moving Averages and Autoregressive Processes . . . . . . . . . . . 13.3.1 Moving Averages . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 Autoregressive Processes . . . . . . . . . . . . . . . . . . . . 13.3.3 Wold Representation . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 ARMA(p,q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.5 Simulations and Filters . . . . . . . . . . . . . . . . . . . . . . 13.3.6 R Code: ARMA Simulations and Wold Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Model Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Moving Averages . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Autoregressive Processes . . . . . . . . . . . . . . . . . . . . 13.4.3 Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.4 R Code: Stationarity, Invertibility, and Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.1 State-Space Representation . . . . . . . . . . . . . . . . . . . 13.5.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.3 R Code: State Space Representation and Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.4 R Code: ARMA Estimation . . . . . . . . . . . . . . . . . . .

449 449 450 452 456 456 457 457 458 459 459 460 461 462 463 465 467 468 468 471 473 477 477 478 479 482 483

xx

Contents

13.6

Model Adequacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.1 Test Residuals for Dependencies . . . . . . . . . . . . . . . 13.6.2 R Code: Box-Pierce & Ljung-Box Tests of Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.3 Models Comparisons . . . . . . . . . . . . . . . . . . . . . . . R Code: Tests of Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.1 Generating Predicted Values . . . . . . . . . . . . . . . . . . 13.7.2 R Code: Forecasting (Mean–Centered) Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 Integregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.1 Model Identification . . . . . . . . . . . . . . . . . . . . . . . . 13.8.2 Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.3 R Code: Dyads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.1 Mixed–Effects Models in Matrix Form . . . . . . . . . . . 14.1.2 Estimation and Prediction . . . . . . . . . . . . . . . . . . . . . 14.1.3 Mixed Model Equations . . . . . . . . . . . . . . . . . . . . . . 14.1.4 Variance Components Estimation . . . . . . . . . . . . . . . 14.1.5 Modeling the Random Terms . . . . . . . . . . . . . . . . . . 14.1.6 R Code: Preliminary Functions for Mixed Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Understanding Mixed–Effects Models . . . . . . . . . . . . . . . . . . . 14.2.1 Creating a Simulation . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 Plotting the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.3 Testing Hierarchical Models . . . . . . . . . . . . . . . . . . . 14.2.4 Intraclass Correlation . . . . . . . . . . . . . . . . . . . . . . . . 14.2.5 Fixed Effects and Random Effects . . . . . . . . . . . . . . . 14.2.6 Interpreting the Random Coefficients . . . . . . . . . . . . . 14.2.7 Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.8 Adding a Random Slope . . . . . . . . . . . . . . . . . . . . . . 14.2.9 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.11 R Code: Mixed Model Simulations . . . . . . . . . . . . . . 14.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 R Code: Mixed Model Fit . . . . . . . . . . . . . . . . . . . . . 14.3.2 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 R Code: Mixed Model Estimation . . . . . . . . . . . . . . . 14.4 Repeated Measures and Growth Curve Models . . . . . . . . . . . . . 14.4.1 Growth Curve Model . . . . . . . . . . . . . . . . . . . . . . . . 14.4.2 Reshaping the Data . . . . . . . . . . . . . . . . . . . . . . . . . .

484 484 484 485 485 486 486 487 488 488 488 492 493 494 495 495 496 498 498 499 499 501 502 502 503 503 505 507 507 510 511 512 512 513 514 515 516 517 518 518 519

Contents

14.4.3 Slopes and Intercepts . . . . . . . . . . . . . . . . . . . . . . . 14.4.4 Model Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 14.4.5 R Code: Growth Curve Analysis . . . . . . . . . . . . . . . 14.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxi

. . . . .

519 520 524 525 526

Part I

Linear Algebra

Chapter 1

Linear Equations

This is a book about statistical analyses: how to conduct them and how to interpret them. Not all of the analyses involve linear functions and few of them have an exact solution, so it might seem odd to begin by learning to solve a consistent system of linear equations. Yet there are good reasons to start this way. For one thing, linear systems form the backbone of most statistical analyses and many nonlinear functions have a linear component. Moreover, many of the methods that are used to solve linear equations were developed long ago by some of the world’s most famous mathematicians. A familiarity with these techniques is therefore of historical interest. Another reason to learn how to solve simultaneous linear equations is more practical. Virtually all statistical analyses are now done on computers and the output is usually approximate, not exact. This is largely (though not exclusively) due to the way computers represent numbers, yielding values that are accurate to 16 decimal digits.1 Clearly, 16 digits is a lot of accuracy, but the fact remains that even equations that have an exact solution often yield an approximate answer. So what does that have to do with solving linear equations? When statistical packages were being developed, a method’s ability to efficiently and accurately solve systems of linear equations was used to judge its utility. This standard is still used today, and some of the most inventive and decorated computer scientists in modern times have devoted their careers to developing algorithms that effectively solve linear equations (Higham, 2002). Learning these methods will help you understand how advanced statistical problems are approached.

1 One way to frame the issue is to ask “what is the smallest quantity that can be added to 1 that will be judged by a computer to be greater than 1? The answer on most computers is 252 ¼ 2.220446e  16. You can prove this with ℛ: The statement, 1+2^52 ¼¼ 1 returns FALSE, but the statement 1+2^53 ¼¼ 1 returns TRUE.

© Springer Nature Switzerland AG 2018 J. D. Brown, Advanced Statistics for the Behavioral Sciences, https://doi.org/10.1007/978-3-319-93549-2_1

3

4

1 Linear Equations

1.1

Row Reduction Methods

Two or more equations that have common terms and are solved simultaneously form a system of equations. If all terms combine additively and each unknown term is of the first order and is not multiplied or divided by any other unknown term, the equations form a set of simultaneous linear equations. In matrix form, a linear system is expressed as follows, Ax ¼ b

ð1:1Þ

with A designating a matrix of known values, x designating a vector of unknown values for which we seek a solution, and b designating a vector of known constants that are the product of Ax. Table 1.1 shows two examples that share an important similarity and an important difference. The similarity is that they share the same solution; x1 x2 x3 x4

¼1 ¼2 ¼3 ¼4

the difference is that the second set of equations is much easier to solve than the first. Why? Notice that all of the values below the diagonal entries in the second set of equations are zero. Consequently, the equations can readily be solved using a technique known as backward substitution. To illustrate, we begin at the bottom and solve for x4. x4 ¼ 16=4 ¼ 4 Afterward, we substitute its value into the third equation and solve for x3. x3 ¼ ð11:2  ð1:6∗4ÞÞ=  1:6 ¼ 3 We continue in this fashion until we have solved all of the equations.    x2 ¼ 25:5  ð4∗4Þ   1:5∗3 =  2:5 ¼ 2    x1 ¼ 33  ð4∗4Þ  3∗3  ð3∗2Þ=2 ¼ 1 Table 1.1 Simultaneous linear equations in ordinary and row echelon form Ordinary form 2x1 þ 3x2 þ 3x1 þ 2x2 þ 2x1 þ 4x2 þ 8x1 þ 2x2 þ

3x3 3x3 2x3 5x3

þ þ þ þ

4x4 2x4 4x4 3x4

¼ ¼ ¼ ¼

33 24 32 39

Row echelon form 2x1 þ 3x2 þ 0  2:5x2  0 þ 0  0 þ 0 þ

3x3 1:5x3 1:6x3 0

þ 4x4  4x4  1:6x4 þ 4x4

¼ 33 ¼ 25:5 ¼ 11:2 ¼ 16

1.1 Row Reduction Methods

5

In sum, it is easy to solve a set of equations when all of the subdiagonal values are zero. This structure is known as row echelon (or upper triangular) form, and the key question is whether there is a way to convert an ordinary set of equations to row echelon form. Fortunately, there is. In fact, several methods can be used. In the following section, we will learn some of them, beginning with a technique named after one of its developers, the German mathematician, Carl Friedrich Gauss.

1.1.1

Gaussian Elimination

Gaussian elimination reduces a matrix to row echelon form by successively transforming each subdiagonal row of an augmented matrix G, G ¼ ½ A j b

ð1:2Þ

where A designates the known values and b designates the known constants. The reduction assumes the following form: ðkþ1Þ

Gij

ðk Þ

¼ Gij  mik gkj ,

i, j, ¼ k þ 1 : n

ð1:3Þ

where, ðk Þ

ðk Þ

mik ¼ aik =akk , i ¼ k þ 1 : n

ð1:4Þ

ðk Þ

is called the multiplier and akk is termed the pivot element. Fortunately, the procedure is less complicated than it seems, so let’s illustrate it using the data from Table 1.1. 1. Create an augmented matrix G by appending b to A. 2

2 63 6 G¼6 42 8

3 2

3 3

4 j 2 j

4 2

2 5

4 j 3 j

3 33 24 7 7 7 32 5 39

2. Beginning with the first column, form the multipliers by dividing all values below the diagonal by the diagonal value (aka the pivot element). 3=2 ¼ 1:5 2=2 ¼ 1 8=2 ¼ 4 3. Use the multipliers to annihilate the subdiagonal entries in column 1.

6

1 Linear Equations

2

3 2 1:5 3 2 3 2 j 24 6 7 4 2 4 j 32 5  4 1 5 ½ 2 3 3 2 5 3 j 39 4 2 3 0 2:5 1:5 4 j 25:5 6 7 ¼ 40 1 1 0 j 1 5

3 42 8

0

10

7

13

j

4 j

33 

93

4. Reform the augmented matrix substituting the new rows for their original values. 2

2 60 6 G¼6 40 0

3 2:5 1 10

3 4 1:5 4 1 0 7 13

3 j 33 j 25:5 7 7 7 j 1 5 j 93

5. Return to Step 2 and repeat using the values in the 2nd column to annihilate the corresponding values in rows 3 : 4. 1=  2:5 ¼ :4 10=  2:5 ¼ 4 

   1 1 0 j 1 :4  ½ 2:5 1:5 4 10 7 13 j 93 4   0 1:6 1:6 j 11:2 ¼ 0 1 3 j 9 2 3 2 3 3 4 j 33 6 0 2:5 1:5 4 j 25:5 7 7 G¼6 40 0 1:6 1:6 j 11:2 5 0 0 1 3 j 9

j 25:5 

6. Return again to Step 2 using the values in the 3rd column to annihilate the corresponding value in row 4. 1=  1:6 ¼ :625 ½ 1

3 j

9   :625 ½ 1:6 1:6 j 11:2  ¼ ½ 0 4 j 2 3 2 3 3 4 j 33 6 0 2:5 1:5 4 j 25:5 7 6 7 G¼6 7 40 0 1:6 1:6 j 11:2 5 0 0 0 4 j 16

16 

When we are through, we have a set of equations in row echelon form that can easily be solved using backward substitution.

1.1 Row Reduction Methods

1.1.2

7

Pivoting

Because division by 0 is undefined, Gaussian elimination fails when a pivot value is 0. Moreover, even pivot elements that are close to zero can introduce sizable rounding errors. To illustrate, imagine we wish to solve the following equations using Gaussian elimination. 1e  12x1 þ x2 ¼ 1 x1 þ x2 ¼ 2 The solution to 12 decimal digits is, x1 ¼ 1 x2 ¼ 1 but if we use Gaussian elimination to solve the problem we get a slightly different answer: x1 ¼ :99986685598 x2 ¼ 1:00000000000 The problem occurs because dividing by a value as small as our first entry leads to round-off error. Fortunately, we can resolve the problem by exchanging the rows, a procedure known as pivoting (or sometimes partial pivoting): x1 þ x2 ¼ 2 1e  12x1 þ x2 ¼ 1 Now, when calculating our multipliers, we are dividing by 1 and the solution is accurate to 12 decimals. x1 ¼ 1:00000000000 x2 ¼ 1:00000000000 Because it usually improves accuracy, it is customary to choose the row with the largest (absolute) diagonal value as the pivot row when performing Gaussian elimination. Applying this rule to our example, we rearrange the rows before beginning the reduction. 2

2 2

5 3

3 j 2 j

4 j

2 3 8 33 6 7 24 7 63 7!6 42 32 5

4

2

4 j

3 39 24 7 7 7 32 5

3 j

39

2

3

3

4 j

33

2 63 6 G¼6 42

3 2

3 3

4 j 2 j

4

2

8

2

5

8

1 Linear Equations

1.1.3

R Code: Gaussian Elimination and Backward Substitution

gaussElim