Applied multiple regression/correlation analysis for the behavioral sciences [Third edition] 9780203774441, 0203774442, 9781134800940, 1134800940, 9781410606266, 1410606260

The Applied Multiple Regression (LRM) model has been in use in statistical analyses for many years; but it was not until

1,208 242 15MB

English Pages 702 [545] Year 2003

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Applied multiple regression/correlation analysis for the behavioral sciences [Third edition]
 9780203774441, 0203774442, 9781134800940, 1134800940, 9781410606266, 1410606260

Table of contents :
Content: Ch. 1. Introduction --
ch. 2. Bivariate correlation and regression --
ch. 3. Multiple regression/correlation with two or more independent variables --
ch. 4. Data visualization, exploration, and assumption checking : diagnosing and solving regression problems I --
ch. 5. Data-analytic strategies using multiple regression/correlation --
ch. 6. Quantitative scales, curvilinear relationships, and transformations --
ch. 7. Interactions among continuous variables --
ch. 8. Categorical or nominal independent variables --
ch. 9. Interactions with categorical variables --
ch. 10. Outliers and multicollinearity : diagnosing and solving regression problems II --
ch. 11. Missing data --
ch. 12. Multiple regression/correlation and causal models --
ch. 13. Alternative regression models : logistic, poisson regression, and the generalized linear model --
ch. 14. Random coefficient regression and multilevel models --
ch. 15. Longitudinal regression methods --
ch. 16. Multiple dependent variables : set correlation.

Citation preview

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences Second Edition

Sex

Jacob Cohen Patricia Cohen

f

.

T im e

h

^Pub.

Salary

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

S econd E d itio n

Page Intentionally Left Blank

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences Second Edition Jacob Cohen New York University

Patricia Cohen New York State Psychiatric Institute

and Columbia University School o f Public Health

Psychology Press Taylor & Francis Croup New York London

First Published hy Law rence Rrlbaum Associates, in t.. Publishers ! 0 industrial Avenue M ahw ah, N ew Jersey 07430 Transferred to Digital Printing 2009 by Psychology Press 270 Madison A vc, N ew York X V 10016 27 Church Road, Hove, liast Sussex, 1JN3 21A Copyright £• 1983 by La^rcrtec Mrlbaujii Associates, Inc. A ll rights reserved. N o part o f this book may be reproduced in any form, by photostat, microform, rciricvsil .system, or an} other mean*., without the prior written permission of rhe publisher.

I.ib ra rv o f C o n g re ss C a ta lo g in g in P u b lica tio n D ata Cohen, Jacob, 1923Apphed multiple regression/correlation analysis for the behavioral sciences. Bibliography: p Includes index. 1. Regression analysis. 1. Cohen, Patricia.

2. C o n e hit ion (Slat isties |

II. Tic to.

ill. Title: M ultiple

regression/correlation analysis for the behavioral sciences.

|D N ILM :

analysis

H A 21 VCf>7Ka]

H A 3 I.3 .C 6 3

1983

I. Behavioral sciences 519 5'36

2. Regression

83-114K7

J S liN 0-89H59-268-2

P ublisher's Note The publisher has gone to great lengths to ensure the qu ality o f this reprint but points out that some im perfections in the original m ay be apparent

to G id e o n M o s e s C o h e n

(another collaborative product, but in no need of revision}

Page Intentionally Left Blank

Contents

Preface to Second Edition xix Preface to First Edition xxi

PART I: Chapter 1: 1.1

BASICS

Introduction......................................................................... .

Multiple Regression/Correlation as a General Data-Analytic System 3 1.1.1 Overview 3 1.1.2 Multiple Regression/Correlation and the Complexity of Behavioral Science 7 Multiplicity of Influences 7 Correlations among Research Factors and Partialling 8 Form of Information 9 Shape of Relationship 11 General and Conditional Relationships 12 1.1.3 Causa! Analysis and Multiple Regression/Correlation 13 1.2 Orientation 15 1.2.1 Approach 15 Nonmathematical 15 Applied 16 Data-Analytic 16 1.2.2 Computation, the Computer, and Numerical Results 18 Computation 18 Numerical Results: Reporting and Rounding 19 Significance Test Results and the Appendix Tables 20 1.2.3 The Spectrum of Behavioral Science 21

v iii

1.3

CONTENTS

Plan 1.3.1 1.3.2

1.4

Equations Sum m ary 24

Chapter 2: 2.1 2.2

22 Content 22 Structure: Numbering of Section, Tables, and 23

Bivariate Correlation and Regression .................................

Tabular and Graphic Representation of Relationships 25 The Index of Linear Correlation between Two Variables: The Pearson Product Moment r 30 2.2.1 Standard Scores: Making Units Comparable 30 2.2.2 The Product M om ent r as a Function of Differences between z Scores 34

2.3

Alternative Formulas for the Product M om ent r 2.3.1 r as the Average Product of ^ Scores 36 2.3.2 Raw Score Formulas for r 36

36

2.3.3 Point Biserial r 37 2.3.4 Phi () Coefficient 39 2.3.5 Rank Correlation 40 2.4 2.5

Regression Coefficients: Estimating Y from X Regression toward the Mean 45

2.6

Error of Estimate and Measures of the Strength of Association 46

2.7 2.8

Sum m ary of Definitions and Interpretations 50 Significance Testing of Correlation and Regression Coefficients 51 2.8.1 Assumptions Underlying the Significance Tests 51 52 2.8.2 f test for Significance of r and B 2.8.3 Fisher's z' Transformation and Comparisons between 2.8.4

Independent rs 53 The Significance of the Difference between Independent

2.8.5

Bs 53 The Significance of the Difference between Dependent rs

56

2.8.6 The Omnibus Null: All r's are zero 2.9

41

Statistical Pow er 2.9.1 Introduction

57

59 59

2.9.2 Pow er of Tests of the Significance of r and B 60 2.9.3 Pow er Analysis for Other Statistical Tests Involving 2.10

r 61 Determining Standard Errors and Confidence Intervals 2.10-1 SE and Confidence Intervals for r and B 2.10.2 Confidence Limits on a Single Y0 Value

62 63

62

CO N TEN TS

2.11

Factors Affecting the Size of r 2.11.1

65

The Distributions of X and Y The Biserial r

66

Tetrachoric r

67

2.11.2

The Reliability of the Variables

2.11.3

Restriction of Range

2.11.4

P a rt- W h o le Correlations C hange Scores

2.11.5 2.11.6

C hapter 3:

72

72

Ratio or Index Variables

73 75

Curvilinear Relationships

Su m m a ry

67

70

Other Ratio Scores 2.12

65

76

76

M u ltip le R egression/C orrelation: T w o or M o re Ind ep en d en t V a r ia b le s .................................................................

3.1

Introduction: Regression and Causal Models 3.1.1 3.1.2

3.2 3.3

Diagram atic Representations of Causal M odels

3.3.3

Multiple R

and R*

86

Sem ipartial Correlation Coefficients Partial Correlation Coefficients

87

91

Patterns of Association between Y and Tw o Independent Variables

92

3.4.1

Direct and Indirect Effects

3.4.2

Spurious Effects and Entirely Indirect Effects

92 96

M ultiple Regression/Correlation with k Independent Variables 3.5.1 Introduction

97 97

3.5.2

Partial Regression Coefficients

3.5.3

100

R and R2 3.5.4 s r and s r2 3.5.5 p r and p r2 3.5.6 3.6

80

81

M easures of Association w ith Two Independent Variables 3.3.2

3.5

79

79

Regression w ith T w o Independent Variables 3.3.1

3.4

W hat is a Cause?

98

101 102

Illustrative Exam ple

102

Tests of Statistical Significance with k Independent Variables 3.6.1 3.6.2

103

Significance of the Multiple R Shrunken R 2 105

103

3.6.3

Significance Tests of Partial Coefficients

3.6.4

Standard Errors and Confidence Intervals for B and p and Hypothesis Tests

109

107

85

X

CONTENTS

3.6.5

U se of M ultiple Regression Equations in Prediction 111 Cross Validation and Unit W eighing

113

So m e Recent W eighing M ethods for Prediction 3.6.6

M ulticollinearity Interpretation

Sam pling Stability Computation 3.7

Pow er Analysis Introduction

3.7.2

Pow er A n alysis

3.7.3

Pow er Analysis

116 for R2 116 for Partial Correlation and Regression 118

Analytic Strategies 3.8.1

116

116

116

3.7.1

Coefficients 3.8

114

115 115

120

Hierarchical Analysis

120

Causal Priority and the Rem oval of Confounding Variables 120 Research Relevance

122

Hierarchical A n alysis Required by Structural Properties

123

3.8.2 Stepw ise Regression 3.9

123

A dequacy of the Regression M odel and Analysis of Residuals 3.9.1

The A n alysis of Residuals C urvilinearity

125

126

126

Outliers 128 Heteroscedasticity

128

Om ission of an Im portant Independent Variable 3.10

S u m m ary

Chapter 4: 4.1

4.2

Sets of Ind ep en d en t V ariables

Introduction

............................................

133

4.1.1

Structural Sets

4.1.2

Functional Sets

134 135

Sim ultaneous and Hierarchical Analyses of Sets 4.2.1 4.2.2

4.3

129

130

The Sim ultaneous Analysis of Sets Hierarchical Analysis of Sets 137

Variance Proportions for Sets

139

4.3.1

The Ballantine Again

139

4.3.2 4.3.3

The Sem ipartial R 2 141 The Partial R2 143

4.3.4

Area c

Sem ipartial R and Partial R 145

144

137

137

133

C O N TEN TS

4.4

Significance Testing for Sets 4.4.1

145

A General F T e s t for an Increm ent (M odel I Error) Application in Hierarchical Analysis Interm ediate Analyses Partialling an Em pty Set

149

150

An Aiternative F Test (M odel II Error}

P o w er Analysis for Sets

148

148

Significance of pR 2 4.4.2

145

146

Application in Sim ultaneous Analysis

4.5

Xi

151

154

4.5.1

Introduction

4.5.2

Determ ining n * for an F Test of sRfa with M o d e l! Error

154 155

4.5.3

Determ ining n * for an F Test of sR% with Model II Error

158

4.5.4

Setting P

4.5.5

Setting P o w er for n *

160 161

4.5.6

Reconciling Different n *s

162

4.5.7

Pow er as a Function of n

162

4.5.8

Pow er as a Function of n: The Special Cases of R2 and 164

s r2 4.5.9 4.6

Tactics of Pow er A n alysis

164

Statistical Inference S trateg y in Multiple Regression/Correlation 166 4.6.1 Controlling and Balancing Type I and Type If Errors in Inference

166

4.6.2

Less Is M ore

169

4.6.3

Least Is Last

171

4.6.4

A Multiple Regression/Correlation Adaptation of Fisher's Protected t Test

4.7

S u m m ary

172

176

PART II:

THE REPRESENTATION OF IN FO R M A TIO N IN INDEPEN DENT VARIABLES

Chapter 5:

N o m in al or Q u alitative S c a le s ...................................................

5.1

Introduction: The Uses of M ultiplicity

5.2 5.3

The Representation of Nom inal Scales Dum m y Variable Coding 183

181 182

5.3.1

Relationships of Dum m y Variables to Y

5.3.2

Correlations am ong Dum m y Variables

5.3.3

183 189

Multiple Regression/Correlation and Partial Relationships for D um m y Variables

190

The Partial Correlation of a Dum m y Variable {p rf)

191

181

xii

CONTENTS

The Sem ipartial Correlation of a Dum m y Variable (s/7 ) 192 The Regression Coefficients and the Regression Equation

193

The Statistical Significance of Partial Coefficients 5.3.4

Analysis of Variance 5.4

194

Dum m y Variable M ultiple Regression/Correlation and

Effects Coding

196

198

5.4.1

Introduction

5.4.2

The R2 and re

198

5.4.3

The Partial Coefficients in Effects Coding

198 200

The Regression, Coefficients and the Regression Equation

200

The Sem ipartial and Partial Correlations of an EffectsCoded Variable

201

The Statistical Significance of the Partial Coefficients

203

Tests of Significance of Differences between Means 5.5

Contrast Coding

204 204

5.5.1

Introduction

5.5.2

The R7 and rs

204

5.5.3

The Partial Coefficients in Contrast Coding

207 209

The Regression Coefficients and the Regression Equation

209

The Sem ipartial and Partial Correlations of a ContrastCoded Variable 5.5.4 5.6

N onsense Coding

5.7

General Im portance

5.8

S u m m ary

Chapter 6:

220

220

Q u a n titative S c a le s ........................................................................

Introduction

6.2

Pow er Polynom ials

223

M ethod

224

224

6.2.2 An Exam ple: A Quadratic Fit 6.2.3

212

217

6.1

6.2.1

210

Contrast Set II: A 2 x 2 Factorial Design

Another Exam ple: A Cubic Fit

6.2.4 Interpretation, Strategy,

225 230

and Lim itations

232

Interpretation of Polynom ial Regression Results How M any Term s? Centering 237 Scaling

234

239

The Polynom ial as a Fitting Function

241

232

223

CONTENTS

6.3

Orthogonal Polynom ials 242 6.3.1 Method 242 6.3.2 The Cubic Example Revisited 244 6.3.3 Unequal n and Unequal Intervals 248 6.3.4 Applications and Discussion 249 Experiments 249 Developmental Studies and Continuous V Serial Data without Replication 250

250

Computational Advantages 251 Orthogonal Polynom ials and Power Polynom ials 6.4

Nominalization

6.5

Nonlinear Transformations 253 6.5.1 Introduction 253 6.5.2 Strong Theoretical Models 6.5.3

6.5.4 6.5.5

6.6

6.7

x iii

252

252

255

Weak Theoretical Models 260 Logarithms and Proportional Change 261 The Square Root Transformation 263 The Reciprocal Transformation No Theoretical Model 264 Transformations of Proportions The Arcsine Transformation The Probit Transformation

263 265 266 268

6.5.6

The Logit Transformation 269 Normalization of Scores and Ranks

6.5.7

The Fisher z ‘ Transformation of r

270 271

Level of Quantitative Scale and Alternative Methods of Representation 271 6.6.1

Ratio Scales

6.6.2

Interval Scales

6.6.3 Ordinal Scales Sum m ary 274

Chapter 7:

272 272 273

Missing Data ..................................................................................

7.1

Introduction

7.2 7.3

7.1.1 Types of Missing Data 275 7.1.2 Som e Alternatives for Handling Missing Data Missing Data in Nominal Scales 281 Missing Data in Quantitative Scales 284 7.3.1 7.3.2

275

Missing Data and Linear Aspects ofV 285 Missing Data and Nonlinear Aspects of V 289 Pow er Polynomials 289 Orthogonal Polynom ials Nominalization 291 Nonlinear Transformations

291 292

278

275

X iv

7.4

CONTENTS

Som e Further Considerations 7.4.1 Plugging with Means Quantitative Scales

7.5

292 292 292

7.4.2

Nominal Scales W hen Not to Use X a

7.4.3 7.4.4 7.4.5

Missing Data and Multiple Independent Variables A Special Application: Conditional Missing Data General Outlook on Missing Data 299

Sum m ary

Chapter 8:

293 296

300

Interactions ....................................................................................

8.1

Introduction

8.2

The 2 x 2 Revisted

8.3

8.2.1 The Special Case of Orthogonal Coding 308 8.2.2 W hen N o t to Use the 2 x 2 Factorial Design 309 A Dichotomy and a Quantitative Research Factor 311

8.5 8.6 8.7 8.8 8.9

308

Introduction

8.3.2

A Problem in Differential Validity of a Personnel Selection Test 312 The Interaction and the Issue of Differential

311

Validity 315 A Covariance Analysis in a Learning Experiment

317

The Interaction as a Violation of an Analysis of Covariance Assumption 319 Two Quantitative Variables 320 Interaction, Scaling, and Regression Coefficients 324 A Nominal Scale and a Quantitative Variable 325 Set Interactions: Quantitative x Nominal 331 Set Interactions: Nominal x Nominal; Factorial Design Analysis of Variance 335 Interactions among More Than Two Sets 345 Sum m ary 349

PART III:

Chapter 9: 9.1

301

301

8.3.1

8.3.3

8.4

297 298

Causal M odels ...........................................

Introduction 9.1.1 9.1.2

APPLICATIONS

353

Limits on the Current Discussion Basic Assumptions and Residuals

353 353

353

CONTENTS

9.2

Models without Reciprocal Causation 355 9.2.1 Direct and Indirect Effects 356 9.2.2

Hierarchical Analysis and Reduced Form

9.2.3

Equations 360 Partial Causal Models and the Hierarchical Analysis of

9.2.4 9.2.5

Sets 361 Path Analysis and Path Coefficients 366 Model Testing and Identification 367

9.2.6 Curvilinearity and Interactions in Causal Models Nonrecursive Models; Reciprocal Causation 371

369

9.3 9.4

Correlation Among Residuals and Unmeasured Causes

373

9.5 Latent Variable and Measurements Models 9.6 Longitudinal Data 375 9.7 Glossary 375 9.8 Sum m ary 377

Chapter 10:

374

The Analysis of Covariance and Its M ultiple Regression/Correlation G e n e ra liza tio n .............................

10.1

Introduction

10.2

Causal Models and the Analysis of Covariance via Multiple

379

Regression/Correlation 10.2.1 10.3

381

Causal Analysis and ACV

381

10.2.2 ACV via MRC 384 Multiple, Nonlinear, Missing-Data, and Nominal Scale Covariates 387 10.3.1 10.3.2

Introduction 387 The Hierarchical R2 Analysis by Sets

10.3.3 Multiple Linear Covariates 10.3.4 A Nonlinear Covariate Set 10.3.5 A Covariate with Missing Data

388

391 394 396

Analysis of Covariance for More Complex Factorial and Other Experimental Designs 399 10.3.6 Nominal Scales as Covariates 401 10.4

10.3.7 Mixtures of Covariates 402 The Analysis of Partial Variance: A Generalization of Analysis of Covariance 402 10.4.1 Introduction 402 10.4.2 The Analysis of Partial Variance 403 10.4.3 Analysis of Partial Variance with Set B Quantitative 404

10.5

XV

The Problem of Partialling Unreliable Variables 10.5.1 Introduction 406

406

379

xvi

10.6

CONTENTS

10.5.2 The Effect of a Fallible Partialled Variable 10.5.3 Some Perspectives on Covariates 412 The Study of Change and APV 413 10.6.1 Introduction 413 10.6.2 Change Scores 414 10.6.3 The APV in the Study of Change 417 A Simple Case 417 Multiple Groups 422 Nonlinear Change 422 M ultiple Time Points; Other Covariates

10.7 10.8

11.3

11.4

422

Some Perspectives on the Role and Limitations of Partial Iing 423 Summary 425

Chapter 11:

11.1 11.2

407

Repeated Measurement and Matched Subjects Designs....................................................................................

428

Introduction 428 The Basic Design: Subjects by Conditions 429 11.2.1 Between-Subjects Variance 429 11.2.2 Within-Subjects Variance 431 11.2.3 Significance Tests 432 11.2.4 sr and pr for a Single Independent Variable 434 11.2.5 Power Analysis 435 Subjects within Groups by Conditions 437 11.3.1 The Analysis of the Between-Subjects Variance 438 11.3.2 The Analysis of the Within-Subjects Variance 439 11.3.3 Significance Tests 442 11.3.4 sr and pr for a Single Independent Variable 444 11.3.5 Illustrative Example 444 The Between-Subjects Analysis 445 The Within-Subjects Analysis 445 11.3.6 Power Analysis 449 Summary 450

Chapter 12:

Multiple Regression/Correlation and Multivariate M e th o d s........................................................................

12.1 12.2

Introduction 452 The Canonical Generalization

12.3

Specializing and Expanding Canonical Analysis 12.3.1 Multiple Regression/Correlation 458

12.3.2

453

Discriminant Analysis with g Groups

458

458

452

C O N TEN TS

12.4

12.3.3

M ultivariate A V (M A N O V A )

12.3.4

M ultivariate AC V (M A N A C O V A )

12.3.5

Expanding Canonical Analysis

Su m m a ry

x v ii

462 463 463

465

APPENDICES

A ppendix 1:

T he M a th e m a tic a l Basis fo r M u ltip le R egression/C orrelation and Identification of the Inverse M atrix Elem ents

A 1.1

Alternative Matrix M ethods

A 1.2

Determ inants

A ppendix 2:

......................................................... 472

473

Desk C alculator Solution of th e M u ltip le R egression/C orrelation Problem : D eterm ination of th e Inverse M a trix and Applications Thereof .........................

A 2.1

A ppendix 3:

479

C o m p u ter Analysts of M u ltip le Regression/Correlation .............................................................

A 3.1

Introduction: Data Preparation and Univariate and Bivariate

A 3.2

Displays 481 The Multiple Regression/Correlation Output A 3.2.1

Sim ultaneous A n alysis

A 3.2.2

Hierarchical Analysis

483

484

A n alysis of Residuals 484 Special Regression Program Options

A 3.5

T S L S and G L S Options for the Analysis of Nonrecursive Causal M od els

484

486

Program Capacity and A ccu racy

A ppendix 4:

486

S et Correlation as a G eneral M u ltiv a ria te DataA nalytic M eth o d ..........................................................................

Abstract

481

483

A 3.3 A 3.4

A 4.1

474

Testing the Difference Betw een Partial Coefficients from the S a m e Sa m p le

A 3.6

469

487

Introduction

488

A 4.1.1

Background

A4.1.2

Inadequacy of Canonical Analysis

488 489

487

XV iii

CONTENTS

A 4.2

Elem ents of the Set Correlation System A 4.2.1

Sets A 4.2.2

490

M easures of M ultivariate Association Betw een 491

Proportion of Additive Variance and Trace Correlation

493

A 4.2.3

Partialling

496

A 4.2.4

R?YX and 7^,* for the Five Types of Association

A 4.2.5 A 4.2.6

Testing the Null Hypothesis 500 Guarding Against Experim entw ise Type I

A 4.2.7

Conventional M ultivariate M ethods Applications

Error Inflation

503

by Set Correlation A 4.3

504

N ew Analytic Possibilities: Illustrative Exam ples A 4.3.1

C om m on and Unique Aspects of a Battery

A 4.3.2

A Nom inal Y S e t and Its Contrasts

A 4.3.3

A M ultivariate A n alysis of Partial Variance

A 4.3.4

Contingency Tables

A ppendix Tables

Table C

Norm al Distribution

Table D

F Values for a. =■ .01, .05 L Values for a = .01, .05

512

519

521 522 526

P o w er of Significance Test of r at n - .01, .05 (TwoTailed)

Table G

506

508

.....................................................................................................

t Values for a = .01, .05 {Two-Tailed} z' Transform ation of r 520

Table F

506

515

Table A Table B

Table E

497

528

n * to Detect r by f Test at a = .01, .05 (Two-Tailed}

References

531

A u th or Index

537

Subject index

541

530

519

Preface to the Second Edition

The seven years since the publication of the first edition have been fat ones tor multiple regression/correlation as a general data-analylic system ( “ new-look” M R C , in short). The behavioral and social science journals have carried hun­ dreds of pages on methodological issues in M R C , (much of it on how-when-whether it can replace other methods), and, increasingly, research reports that employ it. Several M R C textbooks and chapter-Iength treatments have appeared, and M R C interest groups have formed as pan of specialized scientific organizations. “ New-look” M R C has been applied across a spectrum that reaches from the molecular end of psychology through the molar end of sociology, and has been particularly popular in education, the evaluation of intervention programs, drug research, and psychiatric epidemiology. Its obvious relevance to "meta-analysis” has not been overlooked. While much of the "nuts and bolts” in the original edition remains intact, there has been a fundamental change in outlook. The relevance of M R C to the study of causality, dimly hinted at here and there in the first edition, now emerges as the central principle. From the very beginning in the presentation of the two-variable regression equation, through the interpretation of patterns of association with two independent variables, alternative analytic strategies, and setwise hierarchical analysis, the intimate relationship of M R C to the formal analysis of causal models is dcscrihcd and illustrated. After the methods of representing information as data for M R C are presented, a chapter is devoted to the analysis of causal models, and simple working methods employing systems of regression equations arc provided. The detailed exposition of the analysis of covariance is preceded by a causal models analysis of the types of research design that employ this method. Throughout, the exposition reflects our conviction that the valid analysis of nonexperimental data can proceed only when it is in keeping with the principles and insights of the analysis of causal models. xix

XX

PREFACE TO THE SECO ND EDITION

Another change wc believe to be important does not occur in the body of the text, but appears as Appendix 4. It is a reprinting, with minor revisions, o f a newly derived multivariate generalization o f M R C . one that handles sets and partiallcd sets of variables as dependent variables. Although we have found this method most useful in our work and believe that it is the latest word in quantita­ tive methodology, its novelty and our acadcmic caution (as well as our unfailing modesty) dictate that it not be pan o f the centra! expository framework (although it has led us to modify the material on canonical analysis, which it seeks to largely replace). Other new material includes a test o f the omnibus null hypothesis that all of the correlations in a matrix are zero in the population, the analysis of residuals, the analysis of conditional missing data (such as result from such survey items as " I f not presently married, skip to item 1 6 "), and a detailed analysis of the role of scaling in the interpretation of interaction regression coefficients. I'he goals, style, tone, and emphasis remain the same. W e continue to intend the book to serve both as a textbook for students and a handbook for researchers. The nonmathcmatical, intuitive, applied, data-analytic features remain. The em­ phasis on the use o f M R C in the service of scientific explanation rather than that of forecasting continues. W h ile the latter is not neglected, its limitations arc noted in a new section on unit weighting and other alternatives to regression weights in prediction. As before, the exposition is heavily laced with worked examples; one, on factors associated with academic salaries, is carried through the book and culminates in its use to exemplify the complete analysis o f a causal model. As always, we first acknowledge the many students, coileagues, and re­ searchers whose response to the first edition and other interaction w ith us taught us a! least as much as we taught them. Again, our colleagues at meetings of the Society o f Multivariate Experimental Psychology served as a constructively crit­ ical forum for some of the new material and our genera! approach. Among them, John I.oehiin provided helpful comments on Chapters 3 and 9 as did the anony­ mous referees o f our journal. Multivariate Behavioral Research, on the set cor­ relation paper (Appendix 4). W e thank Gregory Muhlin for help with the com­ puter program information in Appendix 3, and E . L . Struening for his general support. Larry Erlbaum has earned a special debt of gratitude for making it all so easy. Annette Priedner and Detra Allen did most o f the typing, and beautifully.

JA C O B C O H EN P A T R IC IA C O H E N

Preface to the First Edition

This book had its origin about a dozen years ago, when it began to become apparent to the senior author that there were relationships between regression and correlation on the one hand and the analysis of variance on the other which were undreamed of (or at least did not appear) in the standard textbooks with which he was familiar. On the contrary, the texts, then as now, treated these as wholly distinct systems of data analysis intended for types of research which differed fundamentally in design, goals, and types of variables. Some research, both statistical and bibliographic, confirmed the relationships noted, and revealed yet others. These relationships served to enrich both systems in many ways, but it also became clear that multiple regression/correlation was potentially a very general system for analyzing data in the behavioral sciences, one that could incorporate the analysis of variance and covariance as special cases. An article outlining these possibilities was published in the Psychological Bulletin (Cohen, 1968), and the volume and sources of reprint requests and several rcprintings suggested that a responsive chord had been struck among behavioral scientists in diverse areas. It was also obvious that for adequacy of both systematic coverage and expository detail, this book-length treatment was needed. In 1969, the authors were married and began a happy collaboration, one of whose chief products is this book. (Another is saluted on the dedication page.) During the preparation of the book, the ideas of the 1968 paper were expanded, further systematized, tried out on data, and hardened in the crucible of our teaching and consulting. W e find the system which has evolved surprisingly easy to teach and learn, and this book is an effort to so embody it. W e omit from this preface, except incidentally, a consideration of this hook’s scope, orientation, and organization, since Chapter I is largely devoted to these issues. To describe the primary audience for whom this book is intended requires two dimensions. Substantively, this book is addressed to behavioral and social scien­ tists. These terms have no sharply defined reference, but we intend them in the xxi

XXii

PREFACE TO THE FIRST EDITION

most inclusive sense

10

include the academic sciences of psychology, sociology,

economics, branches of biology, political science, and anthropology, and also various applied research fields: education, clinical psychology and psychiatry, epidemiology, industrial psychology, business administration, social work, and political/social survey, market and consumer research. Although the methods described in this book are applicable in other fields {for example, industrial engineering, agronomy), our examples and atmospherics come from behav­ ioral-social science. The other dimension of our intended audience, amount of background in statistics and research, covcrs an equally broad span. This book is intended to be both a textbook for students and a manual for research workers, and thus requires a somewhat different approach by these two readerships. However, one feature of this book will be appreciated by a large majority of both groups of readers: its orientation is nonmathcmatical, applied, and “ data-analytic.” This orientation is discussed and justified in the introductory chapter (Section 1.2.1) and will not be belabored here. Our experience has been that with few exceptions, both students and research practitioners in the behavioral and social sciences approach statis­ tics with considerable wariness (to say the least), and require a verbal-intuitive exposition, rich in redundancy and concrete examples. This we have sought to supply. As a textbook, whether used in a course at the graduate or advanced under­ graduate level, it is assumed that the students have already had a semester's introductory statistics course. Although Chapter 2 begins "from scratch” with bivariate correlation and regression, and reviews elementary statistical concepts and terminology, it is not really intended to be a thorough, basic exposition, but rather to refresh the reader’s memory. Students without a nodding acquaintance with the analysis of variance will find some portions of Chapter 1 difficult; returning to this material later in the course should clear matters up. Increasingly, statistical offerings in graduate programs are so organized as to include a course in correlation/regression methods. This book is intended to serve as a text for such courses. It may also be used in courscs in multivariate methods— although largely devoted to multiple regression/correlation analysis, the final chapter links it to and reviews the other multivariate methods. As a manual, this book provides an integrated conceptual system and practical working methods for the research worker. The last five years has seen a rapidly growing interest in multiple regression/correlation methods, reflected in journal articles and books addressed to psychologists and sociologists. Much of this material is valuable, while some of it is misguided or simply incorrect. Some of the more valuable contributions arc presented mathematically, thus limiting their access. Taken as a whole, the recent literature is lacking in the combination of an integrated conceptual system with easily understood practical working methods which is necessary for the method to realize its potential as a general dataanalytic system. W e have tried to provide this. Chapter 1 begins with an outline of this system, and was written primarily with the experienced research worker

PREFACE TO THE FIRST EDITION

XXII!

or advanced graduate student in mind. He or .she will find much of Chapters 2 and 3 elementary, but they are worth skimming, since some of the topics are treated from a fresh perspective which may be found insight provoking. Chapter 4 considers sets of independent variables as units of analysis, and is basic for much of what follows. Beyond that, he may follow his specific interests in the chapters and appendices by reference to the tabic of contents and a carefully prepared index. To the stat buff or teacher, we recommend reading the chapters in order, and the appendices at the point in the text where they are referenced. W e acknowledge, first of all. the many students, colleagues and researchers seeking counsel whose stimulation so importantly shaped this book. A small subset of these are Elmer L. Struening, Mendl Hoffman, Joan Welkowit?, Claudia Riche, and Harry Reiss, but many more could be named. Special thanks are due to the members of the Society of Multivariate Hxperimental Psychology for the useful feedback they supplied when portions of the book were presented at their annual meetings during the last few years. W e are very grateful to Joseph L. Fleiss for a painstaking technical critique from which the book greatly prof­ ited. Since we remained in disagreement on some points, whatever faults remain are our sole responsibility. Gerhard Raabe provided valuable advice with regard to the material on computer programs in Appendix 3. Patra Lindstrom did a most competent job in typing the manuscript. J P

acor

C

ohkn

a t r ic ia

C

o h ln

Page Intentionally Left Blank

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

S e co n d E d itio n

Page Intentionally Left Blank

PART I BASICS

Page Intentionally Left Blank

1

Introduction

1.1 M ULTIPLE REGRESSION/CORRELATION AS A GENERAL DA TA -ANA LYTIC SYSTEM

1.1.1 O verview Multiple regression/correlation analysis ( M R C ) is a highly general and therefore very flexible data-analytic system that may be used whenever a quantitative variable (the dependent variable) is to be studied as a function of, or in relation­ ship to, any factors o f interest (expressed as independent variables). The sweep of this statement is quite intentional: 1. The form o f the relationship is not constrained; it may be simple or com­ plex, for example, straight line or curvilinear, general or conditional, or com­ binations of these possibilities. 2. The nature of the research factors expressed as independent variables is also not constrained: they may be quantitative or qualitative, main effects or interac­ tions in the analysis of variance ( A V ) sense, or covariates as in the analysis of covariancc (A C V ). They may be characterized by missing data. They may be correlated with each other, or uncorrelated (as in balanced factorial design A V ). They may be naturally occurring ( “ organ ism i c " ) properties like sex or diagnosis or IQ , or they may be conscquenccs o f planned experimental manipulation ( “ treatments” ). They may be single variables or groups of variables. In short, virtually any information whose bearing on the dependent variable is of interest may be expressed as research factors. The M R C system presented in this book has other properties that make of it a powerful analytic tool: it yields measures o f the magnitude of the “ whole” relationship of a factor to the dependent variable, as well as of its partial (unique, net) relationship, that is, its relationship over and above that of other research 3

4

1.

INTRODUCTION

factors (proportions of variance and coefficients of correlation and regression). It also comes fully equipped with the necessary apparatus for statistical hypothesis testing, estimation, and power analysis. Last, but certainly not least, it is a major tool in the methods of causal analysis. In short, and at the risk o f sounding like a television commercial, it is a versatile, all-purpose system for analyzing the data o f the behavioral, social, and biological sciences and technologies. W e are, o f course, describing the “ new-look” M R C that has developed over the past 2 decades, not the stereotyped traditional application that was largely limited to the psychotechnologica! task of forecasting outcomes in educational or personnel selection and vocational guidance. The very terminology betrays this preoccupation: Criterion (dependent) variables are predicted by “ predictor” (independent) variables, correlations between predictors and criterion are callcd

rulidily coefficients, and accuracy is assessed by means of the “ standard error of prediction.” Even when put to other uses, stereotypy is induced either implicitly or explicitly by limiting M R C to straight-line relationships among equal-interval scales on which the observations are assumed to be normally distributed. In this narrow view , M R C takes its place as one o f a group of specialized statistical tools, its use limited to those occasional circumstances when its unique function is required and its working conditions are met. Viewed from this traditional perspective, it is hard to see why anyone would want a whole textbook devoted to M R C , a monograph for specialists perhaps, but why a textbook? N o such question arises with regard to textbooks entirely devoted to the analysis o f variance and covariance, because of its presumed generality. This is ironic, since, as we w ill show, A V / A C V is in faet a spccial case of M R C ! Technically, A V / A C V and conventional multiple regression analysis are spe­ cial cases o f the “ general linear m od el" in mathematical statistics.1 The M R C system of this book generalizes conventional multiple regression analysis to the point where it is essentially equivalent to the general linear model. It thus follows that any data analyzable by A V / A C V may be analyzed by M R C , while the reverse is not the case. This is illustrated, for example, by the fact that when one seeks in an A V / A C V framework to analyze a factorial design with unequal cell frequencies, the non independence o f the factors necessitates moving up to the more general multiple regression analysis to achieve an exact solution. Historically, M R C arose in the biological and behavioral sciences around the turn of the century in the study o f the natural covariation of observed characteris­ tics of samples of subjects (Gallon, Pearson, Yule). Somewhat later, A V / A C V grew out of the analysis of agronomic data produced by controlled variation of treatment conditions in manipulative experiments (Fisher). The systems devel­ oped in parallel, and from the perspective o f the research workers who used 'For (he technically minded, we point out that it is the fixed" version of these models to uhich we address ourselves, whjch js ihe way they are most often used.

1.1 MRC A S A G EN ER A L DATA-ANALYTIC S Y S T E M

5

them, largely independently of each other. Indeed, M R C , because o f its associa­ tion with nonex peri mental, observational, survey-type research, came to be looked upon as less scientifically respectable than A V / A C V , which was associ­ ated with experiments. The recent development o f causal analysis, formal sys­ tems of inference based on nonexperimental data, with its heavy dependence on regression analysis, has tended to offset this onus. Close examination suggests that this guilt (or virtue) by association is unwar­ ranted— the result of the confusion o f data-analytic method with the logical considerations which govern the inference of causality. Experiments in which different treatments arc applied to randomly assigned groups of subjects permit unambiguous inference o f causality, while the observation o f associations among variables in a group o f randomly selected subjects does not. Thus, the finding o f significantly more cell pathology in the lungs o f rats reared in cigarette smokefilled environments than for normally reared control animals is in a logically superior position to draw the causal inference than is the finding that, for a random sample o f postmortem eases, the lung cell pathology is significantly higher for divorced men than for married men. But each o f these researches may be analyzed by cither A V (a simple pathology mean difference and its t test) or M R C (a simple correlation between group membership and pathology and its identical t test). The logical status o f causal inference is a function of how the data were produced, not how they are analyzed. Yet it is not surprising that the greater versatility of M R C has made it the vchiclc o f formal causal analysis. Authors who make sharp theoretical distinctions between correlation and fixed-model A V (or fixed-model regression) arc prone to claim that correlation and proportion of variance (squared correlation) measures lack meaning for the latter bccause these measures depend on the specific levels of the research factor chosen (fixed) by the investigator and the (fixed) number of cases at each level. Concretely, consider an experiment where random samples of subjects are ex­ posed to several levels o f magnitude of a sensory stimulus, each sample to a different level, and their responses recorded. Assume that, over the purists' objections, we compute a correlation between stimulus condition and response, and find it to be .70, that is, about half (.702 — .49) of the response variance is accounted for by the stimulus conditions. They would argue, quite correctly, that the selection o f a different set o f stimulus values (more or less varying, occurring elsewhere in the range), or a different distribution of relative sample sizes, would result in a larger or smaller proportion of the response variance being accounted for. Therefore, they would argue, the .49 (or .70) figure can not be taken as an estimate o f “ the relationship between stimulus and response” for this sensory modality and form o f response. Again, we must agree. Therefore, they would finally argue, these measures are meaningless. Here, wc beg to disagree. W e find such measures to be quite useful, provided that their dependence on the levels and relative sample sizes of the research factor is understood. When necessary, one simply attaches to them, as a condition or qualification, the distribution of the research factor. W e find such qualifications no more objec-

6

1.

INTRODUCTION

tionablc. in principle, than the potentiaily many others (apparatus, tests, time of day, subjects, experimenters) on which research results may depend. Such mea­ sures, qualified as necessary, may mean more or less, depending on substantive considerations, but they are hardly meaningless. (For an example where the research factor is religion, and further discussion o f this issue, see Scction 5.3.1) On the contrary, one of the most attractive features o f M R C is its automatic provision o f regression coefficients, proportion of variance, and correlation mea­ sures of various kinds. These are measures o f “ effect s iz e ." of the magnitude of' the phenomena being studied. W e venture the assertion that, despite the preoc­ cupation (some critics would substitute “ obsession” ) of the behavioral and social sciences with quantitative methods, the level o f eonsciousncss in many areas of just how big things are is at a surprisingly low level. This is because concern about the statistical significance of effects (whether they exist at all) has tended to preempt attention to their magnitude. That significant effects may be small, and nonsignificant ones large, is a truism. Although not unrelated, the size and statistical significance o f effects are logically independent features of data from samples. Yet many research reports, at least implicitly, confuse the issues of si?e and statistical significance, using the latter as if it meant the former. At least part of the reason for this is that traditional A V / A C V yields readily intcrpretable F and t ratios for significance testing, but offers differences between group means as measures o f effect size.2 N ow , a difference between means is a reasonably informative measure of effect size when the dependent variable is bushels o f wheat per acre, or dollars of annual income, or age at time of marriage. It is, however, less informative when the dependent variable is a psychological test score, a sociological index, or the numher of trials to learn a maze. M any of the variables in the behavioral and social sciences are expressed in units that are arbitrary, ad hoc, or unfamiliar. T o report, for example, that law students show a 9.2-point higher mean than medical students on a scale measur­ ing attitude toward the United Nations is to convey very little ahout whether this constitutes a large or trivial difference. However, to report that the law student-medical student distinction accounts for 4% o f the attitude score variance conveys much more. Further, to report that the law students’ mean on another scale, attitude toward public service, is 6.4 points higher than the medical stu­ dents’ mean rot only fails to convey a useful sense of the size o f this difference (as before), but is not even informative as to whether this is a smaller or larger difference than the other, since the units o f the two scales are not directly comparable. But reporting that this distinction accounts for 10ck o f the puhlic service attitude score variance not only expresses the effect size usefuliy, hut is

2Ttm is no! to say (hat the more useful measures of proportion of variance have not been proposed in A V contcxls; see. for example. Hays i i OS i i. Cohen (1965. 1977), and Section 5.3.4. But (hey are neither integral to the A V tradition nor routinely presented in research repons where the daia are analyzed by AV/ACV.

1.1 MRC A S A GENERAL DATA-ANALVT1C SYSTEM

7

comparable to the 4% found for the other variable. Sincc various types of proportion of variance, that is. the squares of simple, multiple, partial, and semipartia! correlation coefficients, may be routinely determined in the M R C system, the latter has “ built-in” effect size measures that are unit-free and easily understood and communicated. Each of these comes with its significance test value for the null hypothesis {F or r), and no confusion between the two issues of whether and how much need arise. 1.1.2 M ultiple Regression/Correlation and the Complexity of Behavioral Science The greatest virtue of the M R C system is its capacity to mirror, with high fidelity, the complexity of the relationships that characterize the behavioral sci­ ences. The word complexity is itself used here in a complex sense to cover several issues.

M ultiplicity o f Influences The behavioral sciences inherited from older branches of empirical inquiry the simple experimental paradigm: vary a single presumed causal factor (C ) and its effects on the dependent variable (>^), while holding constant other potential factors. Thus, Y = f(C )\ variation in Y is a function of controlled variation in C. This model has been, and continues to be, an effective tool of inquiry in the physical scicnces and engineering, and in some areas of the behavioral scicnccs. Probably because of their much higher degree of evolution, functional areas within the physical sciences and engineering typically deal with a few distinct causal factors, each measured in a clear-cut way, and each in principle indepen­ dent of others. However, as one moves from the physical sciences through biology and across the broad spectrum of the behavioral sciences ranging from physiological psy­ chology to cultural anthropology, the number of potential causal factors in­ creases. their representation in measures becomes increasingly uncertain, and weak theories abound and compete. Consider a representative set of dependent variables: epinephrine secreted, distant word associations, verbal learning, school achievement, psychosis, anxiety, aggression, attitude toward busing, in­ come, social mobility, birth rate, kinship system. A few moments’ reflect ion about the causal nexus in which each of these is embedded suggests a multiplicity of factors, and possibly further multiplicity in how any given factor is repre­ sented. Given several research factors C, D, K, etc., to be studied, one might use the single-factor paradigm repeatedly in multiple researches, that is, Y — / (C ), then Y — f (D ). then Y = f (£), etc. But M R C makes possible the use of paradigms of the form Y = / (C , D, li, ctc.), which are far more efficient than the strategy of studying multiple factors one at a time. Moreover, causal analysis, utilizing interlocking regression equations as formal models, achieves an even greater degree of appositeness to complex theories.

8

1.

INTRODUCTION

C orrelation am ong Research Factors and P artialling A far more important type o f complexity than the sheer multiplicity o f research factors lies in the effect of relationships among them. The simpler condition is that in which the factors C, D, E, . , , are statistically unrelated (orthogonal) to each other, as is the case in true experiments where they arc under the experimenter’s manipulative control. The overall importance of each factor (for example, the proportion o f Y variance it accounts for) can be unambiguously determined, since its orthogonality with the other factors assures that its effects on Y can not overlap with the effects of the others. Thus, con­ cretely, consider a simple experimental inquiry into the proposition “ don’t trust anyone over 30’ ’ in which the persuasibility (>0 of male college students is studied as a function o f the apparent age (C : C , = under 30, C2 = over 30) and sex { D : D , = male, D 2 = female) of the communicator o f a persuasive message. The orthogonality o f the research factors C and D is assured by having equal numbers o f subjects in the four "c e lls ” ( t ',/3 ,, C tD 2, , C2D2}\ no part of the difference in overall Y means for the two communicator ages can be at­ tributed to their sexes, and conversely, since the effect of each factor is balanced out in the determination o f the other. I f it is found that C accounts for 10% of the

Y variance, and D for 5 % , no portion of either of these amounts can be due to the other factor. It thus follows that these amounts are additive: C and D together account for 15% o f the Y variance.3 Complexity arises when one departs from manipulative experiments and the orthogonality of factors w'hich they make possible. M any issues in behavioral sciences are simply inaccessible to true experiments, and can only be addressed by the systematic observation o f phenomena as they occur in their natural flux. In nature, factors which impinge on Y are generally correlated with each other as well. Thus, if persuasibility (V ) is studied as a function of authoritarianism (C ), intelligence (D ), and socioeconomic status (E) by surveying a sample with regard to these characteristics, it w ill likely be found that C, D, and E are to some degree correlated with each other. If, taken singly, C accounts tor 8 % , D 12%, and E 6% of the Y variance, because of the correlations among these factors, it w ill not be the case that together, in an M R C analysis, they account for 8 I 12 + 6 = 26% o f the Y variance. It w ill almost certainly be less (in this case, but may, in general, be more— see Section 3.4). This is the familiar phenomenon of redundancy among correlated explanatory variables w'ith regard to what they explain. The Y variance accounted for by a factor is overlapped to some degree ’ The reader familiar with A V will recognize this as a balanced 2 x 2 factorial design. To avoid possible confusion, it must be pointed out that the orthogonality of C and 1) is a fact which is wholly independent of the possibility of a C * D interaction. Interactions are research factors in their own right and in balanced designs are also orthogonal to all other research factors. If the C ■< D interaction were to be included as a third factor and fonnd to account for 7% of (he variance, this amount is wholly its own, and all three factors combined would account for i 7'r of the Y variance. See the section “ General and Conditional Relationships," and Chapter 8, which is devoted entirely to interactions.

1.1 MRC AS A GENERAL DATA-ANALYTIC SY ST EM

9

with others. This in turn implies the concept of the variance accounted for by a factor uniquely, relative to what is accounted for by the other factors. In the above example, these unique proportions of Y variance may turn out to be: C 4%, D 10%, and E 0%. This is a rather different picture than that provided by looking at each factor singly. For example, it might be argued that F ’ s apparent influence on Y when appraised by itself is “ spurious,” being entirely attributable to its relationship to C and/or D . Detailed attention to the relationships among the causal variables and how these bear on Y is the hallmark of causal analysis, and may be accomplished by M R C . M R C ’s capability for assessing unique variance, and the closcly related mea­ sures of p a rtia l correlation and regression cocfficicnts it provides, is perhaps its most important feature, particularly for observational (nonexperimental) studies. Even a small number of research factors define many alternative possible causal systems or theories. Selection among them is greatly facilitated by the ability, using M R C , of partialling from the effects of any research factor those of any desired set of other factors. It is a copybook maxim that no correlational method can establish causal relations, but certain causal alternatives may be invalidated by the skillful use of this feature of M R C . It can show whether a set of observa­ tional data for Y and the correlated research factors C and D arc consistent with any of the following possibilities, which are also expressed, parenthetically, in the language of causa! analysis: 1. C and D each bears causally on Y (cach has a direct cffcct on y.) 2. C is a surrogate for D in relationship with Y, that is, when D is partialled from C, the latter retains no variance in Y. (C has no direct cffcct on >'.) 3. D suppresses the cffcct of C on that is, when D is partialled from C, the unique variance of C in Y is greater than the proportion it accounts for when D is ignored (see the discussion of “ suppression” in Section 3.4). (D represents a cause that is correlated with C but acts on Y in a direction opposite from C.) The possibility for complexity in causal structures is further increased when the number of research factors increases beyond two, yet the partialling inherent in M R C is a powerful adjunct to good theory (i.e., causal models) for disentangling them. Further, partialling is at the base of a series of data-analytic procedures of increasing generality, which are realizable through M R C : general A C V , the Analysis of Partial Variance (see Chapter 10). Most generally, it is the partialling mcchanism more than any other feature which makes it possible for the M R C system to mirror the complexity of causal relationships encountered in the behav­ ioral sciences.

Form o f inform ation The behavioral and social sciences utilize information in various forms. One form which research factors may take is quantitative, and of any of the following levels of measurement (Stevens, 1951):

10

1.

INTRODUCTION

1. Ratio Scales.

These are equal interval scales with a true zero point, mak­

ing such ratio statements as “ J has twice as m u ch X as K ” sensible. X may here be, for example, inches, pounds, seconds, foot-candles, voltage, size o f group, dollars, distance from hospital, years in prison, or literacy rate. 2. Interval Scales,

These have equal intervals but are measured from an

arbitrary zero point, that is, the value of X that denotes absence of the property is not defined. Most psychological measures and sociological indiccs are at this level, for example, the scores o f tests of intelligence, special abilities, achieve­ ment. personality, temperament, vocational interest, and social attitude. A phys­ ical example is temperature measured in Fahrenheit or Centigrade units. 3. Ordinal Scales,

O nly the relalive position within a collection are signified

by the values of ordinal scales, neither conditions o f equal intervals nor a true zero obtaining. Whether simple rank order values are used, or they arc expressed as percentiles, deciles, or quartiles, these properties o f ordinal scales are the same. The above schcme is not exhaustive o f quantitative scales, and others have been proposed. For example, psychological test scores are unlikely to measure with exactly equal intervals and it may be argued that they fall along a continuum between interval and ordinal scales. Also, some rating scales frequently used in applied psychological research are not covered by the Stevens scheme since they have a defined zero point but intervals of dubious equality, for example, 0-nevcr, 1-seldom, 2-somelimes, 3-oftcn, 4-always.

N o m in a l Scales.

Traditional M R C analysis was generally restricted to quan­

titative scales with (more or less) equal intervals. Gut much information in the behavioral sciences is not quantitative at ail, but qualitative or categorical, or, using Steven’s (1951) formulation, measured on “ nominal” scales. Whether they are to be considered a form of measurement at all is subject to debate, but they undoubtedly constitute information. Some examples are: ethnic group, place of birth, religion, experimental group, marital status, psychiatric diag­ nosis, type of family structure, choice o f political candidate, sex. Hach of these represents a set of mutually exclusive categories which accounts for all the cases. The categories of true nominal scales represent distinguishable qualities, without natural order or other quantitative properties. Thus, nominal scales are sets of groups which differ on some qualitative attribute. When research factors expressed as nominal scales are to be related to a dependent variable Y, past practice has been to put them through the mill of A V , whose grist is Y values organized into groups. Gut the qualitative information which constitutes nominal scales may be expressed quantitatively, and used as independent variables in M R C . (Chapter 5 is devoted to this topic, and answers such questions as, “ H ow do you score religion'?” ) The above does not exhaust the forms in which information is expressed, since mixtures o f scale types and other irregularities occur in practice. For example, interviews and questionnaires often require for some items the provision of

1.1 MRC A S A G EN ER A L DATA-ANALYTIC S Y S T E M

11

categories for “ docs not apply” and/or “ no response” ; some questions arc asked only if a prior question has had some specified response. As uninformative as such categories may seem at first glance, they nevertheless contain informa­ tion and are capable of expression in research factors (see Chapter 7). The above has been presented as evidence for that aspect o f the complexity of the behavioral sciences which resides in the great variety of forms in which their information comes. Beginning with Chapter 5, we shall show how information in any of these forms may be used as research factors in new-look M R C . The traditional restriction of M R C to equal interval scales w ill he shown to be quite unnecessary. The capacity o f M R C to use information in almost any form, and to mix forms as neccssary, is an important part o f its adaptive flexibility. Were it finicky about the type o f input information it could use, it could hardly function as a general data-analytic system.

Shape o f R elationship W hen w'e come to scrutinize a given relationship expressed by Y — / ( C ) , it may be well described by a straight line on the usual graph, for example, if Y and

C are psychological tests o f abilities. Or, adequate description may require that the line be curved, for example, if C is age, or number of children, such may be the case. O r, the shape may not be definable, as when C is a nominal scale, for example, diagnosis, or college major. When there are multiple research factors being studied simultaneously, cach may relate to Y (and each other) in any of these ways. Thus, when we write Y = f (C, D, /:, . . . ) , / ( as a function of) potentially covers very complex functions, indeed. Yet such complex functions are readily brought under the sway o f M R C . How so? Most readers w ill know that M R C is often (and properly) referred to as linear M R C and may well be under the impression that correlation and regression are restricted to the study o f straight-line relationships. This mistaken impression is abetted by the common usage of “ linear” to mean “ rectilinear,” and “ nonlinear” to mean “ curvilinear.” W e are thus confounded by what is virtually a pun. W hat is literally meant by “ linear” is any relationship o f the form (1.1.1)

Y = a + bU + c V + d W + e X + - ■- ,

where the lower-case letters arc constants (cither positive or negative) and the capital letters are variables. Y is said to be “ linear in the variables U, V, etc.” because it is confected by taking certain constant amounts (b, c, etc.) of each variable, and the constant a, and simply adding them together. Were we to proceed in any other w ay, the resulting function would not be linear in the variables, by definition. But in the fixed-model framework in which we operate, there is no constraint on the nature of the variables. That being the case, they may be chosen so as to define relationships o f any shape, rectilinear or cur­ vilinear, or of no shape at all (as for unordered nominal scales), and all the complex combinations of these which multiple factors can produce.

12

1.

INTRODUCTION

Multiple regression equations are, indeed, linear; they are exactly of the form of Eq. (1.1.1). Yet they can be used to describe such complex relationships as the length of psychiatric hospital stay as a function of symptom ratings on admission, diagnosis, age, sex, and average length of prior hospitalizations (if any). This relationship is patently not rectilinear, yet readily described by a linear multiple regression equation. To be sure, not all or even more relationships studied in the behavioral scicnces are of this order of complexity, but the obvious point is that the capacity of M R C to take any degree or type of shape-complexity in its stride is yet another of the important features which make it truly a general data-analytic system.

Genera! and Conditional Relationships Some relationships between Y and some factor C remain the same in regard to degree and form over variation in other factors D, E, F. W e will call such relationships general or unconditional. The definition of a general relationship holds quite apart from how or whether these other factors relate to Y or to C. This might be the case, for example, if Y is a measure of perceptual acuity and C is age. Whatever the form and degree of relationship, if it remains the same under varying conditions of educational level (/?), ethnic group (£), and sex ( F ), then the relationship can be said to be general (insofar as these other factors are concerned). Note that this generality obtains whatever the relationship between acuity and D, E, and F, between age (C ) and D. E, F, or among D. E, and F. The Y-C relationship can thus be considered unconditional with regard to, or independent of, D, E, and F. Now consider the same research factors but with Y as a measure of attitude towards racial integration. The form and/or degree of relationship of age to Y is now almost certain to vary as a function of one or more of the other factors: it may be stronger or shaped differently at lower educational levels than higher (D), and/or in one ethnic group than another (£), and/or for men compared to women (F). The relationship of Y to C is now said to be conditional on D and/or E and/or F. In A V contexts, such relationships are called interactions, for example, if the C-Y relationship is not constant over different values of D, there is said to be a C x D ( “ age by educational level” ) interaction. Greater complexity is possible: the C-Y relationship may be constant over levels of D taken by themselves, and over levels o f f taken by themselves, yet may be conditional on the combination of D and E levels. Such a circumstance would define a “ second-order” interac­ tion, represented as C x D x E (with, in this case, neither C x D nor C x E present). Interactions of even higher order, and thus even more complex forms of conditionality, arc also theoretically possible. To forestall a frequent source of confusion, we emphasize the fact that the existence of a C x D interaction is an issue quite separate from the relationship of C to D. or D to Y. However age may relate to education, or education to attitude, the existence of C x D means that the relationship of age to attitude is conditioned by (depends on, varies with) education. Since such interactions are

1.1 MRC A S A GENERAL DATA-ANALYTIC SYSTEM

13

symmetrical, this would also mean that the relationship of education to attitude is conditioned by age. One facet of the complexity of the behavioral sciences is the frequency with which conditional relationships are encountered. Relationships among variables often change with changes in experimental conditions (treatments, instructions, experimental assistants, etc.), age, sex, social class, ethnicity, diagnosis, re­ ligion, geographic area, etc. Causal interpretation of such conditional relation­ ships is even more difficult than it is for general relationships, but it is patently important that conditionality be detected when it exists. Conditional relationships may be studied directly, but crudely, by partitioning the data into subgroups on the conditioning variable, determining the relationship in each subgroup, and comparing them. However, problems of small sample size and difficulties in the statistical comparison of measures of relationship from subgroup to subgroup are likely to arise. Factorial design A V provides for assessing conditional relationships (interactions), but is constrained to research factors in nominal form and becomes awkward when the research factors are not orthogonal. The versatility of the M R C system obtains here— conditional rela­ tionships of any order of complexity, involving research factors with information in any form, and either correlated or uncorrelatcd, can be routinely handled without difficulty. (See Chapter 8.) In summary, the generality of the M R C system of data analysis appropriately complements the complexity of the behavioral scicnces, where “ complexity” is intended to convey simultaneously the ideas of multiplicity and correlation among potential causal influences, the variety of forms in which information is couched, and in the shape and conditionality of relationships. Multiple regres­ sion/correlation also provides a full yield of measures of "effect size" with which to quantify various aspects of relationships (proportions of variance, cor­ relation and regression coefficients). Finally, these measures arc subject to statis­ tical hypothesis testing, estimation, and power-analytic procedures. 1.1.3 Causal Analysis and Multiple Regression/Correlation The rapid progress of the natural scicnces and their technologies during the last three centuries has largely been due to the evolution of that remarkable in­ vention, the controlled experiment. Control has been achieved by the care and precision of the manipulation of treatments, by isolation of the experiment from extraneous influences, and most recently and particularly in the life scicnces, by randomization of units to treatments, thus assuring that "a ll other things are equal.” The virtue of the experiment lies in the simplicity of its causal model: with manipulative control of the treatment, randomization assures that the output is a direct causal consequence of the treatment and not of other causcs residing in initial differences between treatment groups. The experimental paradigm has served the behavioral and social sciences well in those areas wherein its demands can be met, for example, in the traditional area of experimental psychology (learning, memory, perception), physiological

14

1.

INTRODUCTION

psychology, comparative psychology, some aspccts of social psychology, and the technologies related to these fields. Practical considerations have limited its utility in some other fields (e.g., education, clinical psychology, program eval­ uation, psychiatric epidemiology, industrial organization theory), and its ap­ plication is effectively impossible in sociology, economics, political science, and anthropology. Either the putative causes or effects can not be produced by investigators (e.g., schizophrenia, low income, fascism, high gross national product, egalitarian management structures), or randomization is precluded, or both. Where experimentation is not possible, scientists have been forced to develop theories from their passive observation of phenomena. In the human scicnces, the phenomena are highly variable, putative causes are many and obscure, ef­ fects often subtle and their manifestations delayed, and measurement is difficult. Small wonder that progress in these fields has been slow. Hnter the analysis of causal models (path analysts, structural equation sys­ tems). Originating in genctics (Wright, 1921) and econometrics about a half century ago, and proceeding independently, there has developed a coherent scheme for the quantitative analysis and testing of theories based on the observa­ tion of phenomena as they occur naturally. Beginning in the 1960s, these meth­ ods grew in sociology and related fields (Blalock, 1971; Goidberger & Duncan, 1973), and in the 1970s in education and psychology (Kenny, 1979). The basic strategy of the analysis of causal models is first to state a theory in terms of the variables that are involved and, quite explicitly, of what causes what and what does not, usually aided by causal diagrams. The observational data arc then employed to determine whether the causal model is consistent with them, and estimate the strength of the causai parameters. Failure of the model to fit the data results in its falsification, while a good fit allows the model to survive, but not be proven, since other models might provide equal or better fits. Causal model analysis provides a formal calculus of inference which prom­ ises to be as important to the systematically observing scientist as is the paradigm of the controlled experiment to the systematically experimenting scientist. A l­ though still new and in the process of rapid development, it seems clear to us that none xperi mental inference that is not consistent with its fundamental principles is simply invalid. iNow, the major analytic tool of causal models analysis is M RC , and particu­ larly regression analysis. Even the simplest regression equation that states that Y is a linear function of X carries, in its asymmetry, the implication that X causes Y. and not the other way around. As the number of variables causing Y increases, we enter the realm of multiple regression analysis. As the complexity of the causal mode! increases, we develop systems of interlocking regression analysis in which a variable may be a cause in one regression equation and an effect in another. Further complexity (i.e., reciprocal causality) may require that we change our methods of estimating causal parameters, but our basic tool remains the regression equation.

1.2 ORIENTATION

15

We find the old saw that “ correlation does not mean causation," although well intcntioned, to be grossly misleading. Causation manifests itself in correla­ tion, and its analysis can only procccd through the systematic analysis of correla­ tion and regression. From the very beginning of the presentation of M R C meth­ ods in the next chapter, our exposition is informed by the conccpts of causal analysis. After the basic devices of M R C are presented, an entire chapter is devoted to causal analysis and its exploitation of these devices in practical working methods. W e hope this material serves as an introduction to this most important topic in research methodology in the behavioral and social scicnces.

1.2 ORIENTATION This book was written to serve as a textbook and manual in the application of the M R C system for data analysis by students and practitioners in the diverse areas of inquiry of the behavioral sciences. As its authors, we had to make many decisions about the level, breadth, emphasis, tone, and style of exposition. Its readers may find it useful, at the outset, to have our orientation and the basis for these decisions set forth. 1.2.1 Approach

Nonmathematica / Our presentation of M R C is nonmathematical. O f course, M R C is itself a product of mathematical statistics, based on matrix algebra, the calculus, and probability theory— branches of mathematics familiar only to math majors. There is little question that such a background makes possible a level of insight otherwise difficult to achieve. However, sincc it is only infrequently found in behavioral scientists, it is bootless to proceed on such a basis, however desirable it may be in theory. Nor do wc believe it worthwhile, as is done in some statistical textbooks addressed to this audience, to attempt to provide the neces­ sary mathematical background in condensed form in an introductory chapter or iwo and then proceed as if it were a functioning part of the reader’s intellectual equipment. In our expcricnce, that simply docs not work— it serves more as a sop to the author’s conscience than as an aid to the reader’s comprehension. W e thus abjure mathematical proofs, as well as unnecessary offhand references to mathematical concepts and methods not likely to be understood by the bulk of our audience. In their place, we heavily emphasize detailed and deliberately redundant verbal exposition of concrete examples drawn from the behavioral scienecs. Our experience in teaching and consulting convinces us that our au­ dience is richly endowed in the verbal, logical, intuitive kind of intelligence that makes it possible to understand how the M R C system works, and thus use it effectively. (Dorothy Parker said, “ Flattery will get you anywhere.'*) This kind of understanding is eminently satisfactory (as well as satisfying), sincc it makes

16

1.

INTRODUCTION

possible the effective use of the system. W e note that to drive a car, one does not need to be a physicist, nor an automotive engineer, nor even an auto mechanic, although some of the latter’s skills arc useful when you are stuck on the highway, and that is the level we aim for. Flat assertions, however, provide little intellectual nourishment. W e seek to make up for the absence o f mathematical proofs by providing demonstrations instead. For example, the regression coefficient for a dichotomous or binary (male-female, yes-no) independent variable (scored 0 - t ) equals the difference between the two groups’ Y means. Instead o f offering the six or seven lines of algebra that would constitute a mathematical proof, we demonstrate that it holds, using a small set of data. True, this proves nothing, since the result may be accidental, but the curious reader can check it out on his own data (and we urge that such checks be made throughout). Whether it is checked out or not, howev­ er, we believe that most o f our audience would profit more from the demonstra­ tion than the proof. I f the absence o f proof bothers some Missourians, all we can do is pledge our good faith.

A p p lied The first word in this book’s title is “ applied.” The heavy stress on illustra­ tions serves not only the function of clarifying and demonstrating the abstract principles being taught, but also that of exemplifying the kinds o f applications possible, that is, providing working models. W e attend to theory only insofar as sound application makes necessary. The emphasis is on “ how to do it.” This opens us to the contemptuous charge of writing a “ cookbook,” a charge we deny, since we do not neglect the whys and wherefores. If the charge is neverthe­ less pressed, we can only add the observation that in the kitchen, cookbooks are likely to be found more useful than textbooks in organic chemistry.

D ata-Analytic The mathematical statistician proceeds from exactly specified premises (inde­ pendent random sampling, normality of distribution, homogeneity of variance), and by the exercise of his ingenuity and appropriate mathematical theory, arrives at exact and necessary consequences (F distribution, statistical power functions). He is, o f course, fully aware o f the fact that no set of real data w ill exactly conform to the formal premises from which he starts, but this is not properly his responsibility. As all mathematicians, he works with abstractions to produce formal models whose “ truth”

lies in their self-consistency. Borrowing their

language, we might say that inequalities are symmetrical: just as hehavioral scientists are not mathematicians, mathematicians are not behavioral scientists. The behavioral scientist relics very heavily on the fruits o f the labors of theoretical statisticians. They provide guides for teasing out meaning from data, limits on inference, discipline in speculation. Unfortunately, in the textbooks addressed to behavioral scientists, statistical methods have often been presented more as harsh straightjackets or Procrustean beds than as benign reference frame­

1.2 O RIENTATION

17

works. T yp ically, a method is presented with some emphasis on its formal assumptions. Readers are advised that the failure of a set o f data to meet these assumptions renders the method invalid. A ll too often, the discussion ends at this point. Presumably, the offending data are to be thrown away. N ow this is, of course, a perfectly ridiculous idea from the point o f view of working scientists. Their task is to contrive situations that yield information about substantive scientific issues— they must and will analyze their data. In doing so, they w ill bring to bear, in addition to the tools o f statistical analysis, their knowledge of theory, past experience with similar data, hunches, and good sense, both common and uncommon. They would rather risk analyzing their data incorrectly than not at ail. For them, data analysis is not an end in itself, but the next-to-last step in a sequence which cuiminatcs in providing information about the phenomena. This is by no means to say that they need not be painstaking in their efforts to generate and perform analyses o f data from which unambiguous conclusions may be drawn. But they must translate these efforts into substantive information. Most happily, the distinction between “ data analysis” and “ statistical analy­ sis” has been made and given both rationale and respectability by one of the w orld’s foremost mathematical statisticians, John Tukey. In his seminal The

Future o f D ata Analysis (1962). Tukey describes data analysis as the special province of scientists with substantial interest in methodology. Data analysts employ statistical analysis as the most important tool in their craft, but they employ it together with other tools, and in a spirit quite different from that which has come to be associated with it from its origins in mathematical statistics. Data analysis accepts “ inadequate” data, and is thus prepared to settle for “ indica­ tions” rather than “ conclusions." It risks a greater frequency of errors in the interest o f a greater frequency o f occasions when the right answer is " s u g ­ g ested ." It compensates for cutting some statistical corners by using scientific as well as mathematical judgment, and by relying upon self-consistency and repeti­ tion o f results. Data analysis operates like a detective searching for clues rather than like a bookkeeper seeking to prove out a balancc. In describing data analy­ sis, Tukey has provided insight and rationale into the way good scientists have always related to data. The spirit o f this hook is strongly data-analytic, in exactly the above sense. W e recognizc the limits on inference placed by the failure of real data to meet some of the formal assumptions which underly fixed-model M R C , but are disposed to treat the limits as broad rather than narrow. W e justify this by mustering what­ ever technical evidence there is in the statistical literature (for example, on the “ robustness” o f statistical tests), and by drawing upon our own and others’ practical experience, even upon our intuition, all in the interest of getting on with the task of making data yield their meaning. I f we risk error, we are more than compensated by having a system o f data analysis that is general, sensitive, and fully capable o f reflecting the complexity of the behavioral sciences and thus of meeting the needs o f behavioral scientists.

18

1.

INTRODUCTION

1.2.2 Com putation, the Com puter, and Numerical Results

Computation Like all mathematical procedures involving simultaneous attention to multiple variables, M R C makes large computational demands. As the size of the problem increases, the amount of computation required increases enormously; for exam­ ple, the computational time required on a desk calculator for a problem with k 10 independent variables and n — 400 cases is measured in days! With such prodigious amounts of hand calculation, the probability of coming through the process without serious blunders (misreading values, inversion of digits, incor­ rect substitution, etc.) cannot be far from zero. Rigorous checking procedures can assure accuracy, but at the cost of increasing computational man-days. The only solution is not to do the calculation on a desk calculator. An important reason for the rapid increase during the past two decades in the use of multivariate4 statistical procedures generally, and for the emergence of M R C as a general data-analytic system in particular, is the computer revolution. During this period, computers have become faster to a degree that strains com­ prehension, more “ user oriented,” and, most important of all, more widely available. Computer facilities are increasingly looked upon as being as necessary in academic and scientific settings as are library facilities. And progressive simplification in their utilization ( “ user orientation” ) makes the necessary know-how fairly easy to acquire. Fundamentally, then, we assume that M R C computation, in general, w ill be accomplished by computer. F'arly in the book, in our exposition of bivariate correlation and regression and M R C with two independent variables, we give the nccessary details with worked examples for calculation by desk or pocket calculators (or, in principle, pencil and yellow pad). This is done because the intimate association with the arithme­ tic details makes plain to the reader the nature o f the proccss: exactly what is being done, with what purpose, and to what result. With two or three variables, where the computation is easy, not only can one see the fundamentals, but there is laid down a basis for generalization to many variables, where the computa­ tional demands are great. With k independent and one dependent variable, M R C computation requires, to begirt with, k + 1 means and standard deviations, and the matrix of k(k + I)/2 correlation coefficients between all pairs of k + 1 variables. It is at this point that the serious computation begins, that is, the solution o f k simultaneous equations, most readily accomplished by a laborious matrix-arithmetic procedure called 'Usage of ihc term multivariate vanes. Some authors restrict it to procedures where multiple dependent variables arc used, by which definition MRC would not be included. However, in­ creasingly and particularly among applied statisticians and behavioral .scientists, the term is used to cover all statistical applications wherein "multiple variatcs are considered in combination" (Cooley & Lohnes. 1971. p 3), cither as dependcnl or independent variables, or both. or. as in factor analysis, neither; .sec Tatsuoko (1971) and Van dc Gccr (1971). Sec Chapter 12 and Appendix 4 for a consideration of M RC in relationship to other multivariate methods.

1.2 O RIENTATIO N

19

“ inversion.” Appendix 1 describes the mathematical basis o f M R C including the role o f the centrally important operation o f matrix inversion, and the content and meaning of the elements of the inverse matrix. In Appendix 2, wc give the actual computational (arithmetic) operations o f matrix inversion and multiplica­ tion for M R C , suitable for use with a desk or pocket calculator. Although, in principle, the computational scheme given in Appendix 2 may be used with any number o f variables, it becomes quite time consuming, and rapidly more onerous, as k increases beyond five or so.5 The reader without access to comput­ ers has our sympathy, but he can manage the computation in small problems by following the procedure in Appendix 2, and may even be rewarded by insights which may accrue from this more intimate contact with the analysis. But "th e human use of human beings” docs not include days spent at a desk calculator. A s we have noted, we primarily reiy on computers for M R C com­ putation. Most of our readers w ill have access to a computer, and w ill either have, or be able quickly to obtain, the modest know-how needed to use the available "can n ed ” programs for M R C . Appendix 3 is devoted to this topic, and includes a description o f the characteristics o f the most popular and widely available programs, and of the considerations which should enter into one’s choice among them. It should be consulted in conjunction with a trip to the computer laboratory to investigate what is available. W c expect that our readers w ill find the material on "h e a v y ” computation useful, but we have deliberately placed it outside the body of the text to keep it from distracting attention from our centra! emphasis, which is on understanding how the M R C system works, so that it may be effectively used in the exploitation of research data. Our attitude toward computing as such explains the absence of chapter-end problems for the reader to work. Some o f the purposes o f such exercises can be achieved by carefully following through the details o f the many worked illustra­ tive examples in the body o f the text. But the highest order of understanding is to be attained when readers apply the methods o f each chapter to data o f their own, or data with which they are otherwise familiar. There is no more powerful synergism for insight than the application o f unfamiliar methods to familiar data.

N um erical R esults: R eporting a n d R ounding W ith minor exceptions, the computation of the illustrative problems which fill this book was all accomplished by computer, using various programs and differ­ ent computers. The numerical results carried in the computer are accurate to at least six significant figures and are printed out to at least four (or four decimal places). W e see little point to presenting numerical results to as many plaecs as Ml is difficult to set a value hen; —who is in say what another will find computationally onerous? Some people find balancing their checkbook a nightmare: others actually enjoy large quantities of arithmetic, particularly when it involves their own data. Five seems a reasonable compromise But it should he kept in mind that computation of tune increases roughly (no pun intended) with (Darlington & Boycc, I9&2)!

20

1.

INTRODUCTION

the computer may provide, since the resulting “ accuracy" holds only for the sample data analyzed, and, given the usual level of sampling error, is quite meaningless vis-a-vis the values of the population parameters. W e mean nothing more complicated than the proposition, for example, that when the computer mindlessly tells us that, in a sample of the usual size, the product moment correlation between X and Y is .34617952, a guaranteed accurate result, at least the last five digits could be replaced by random numbers with no loss. In this book, we generally follow the practice of reporting computed correlation and regression coefficients of all kinds and significance test results rounded to three places (or significant figures), and squared correlations (proportions of variance) rounded to four. (Occasional departures from this practice are for specific reasons of expository clarity or emphasis.) Thus, the above r would be reported as .346. But the computer treats it in the calculations in which it is involved as .34617952 . . . , and its square as .34617952 . . .2. Thus, when we have occasion to report the square of this r, we do not report .3462, which equals . 1197 when rounded to four places, but .34617952 . . ,2 rounded to four places, which is . 1198. When the reader tries to follow our computations (which he should), he will run across such apparent errors as .3462 = . 1198 and others which are consequent on his use in computation of the reported three-digit rounded values. These are, of course, not errors at all, but inevitable rounding discrepan­ cies. Cheeks which agree within a few points in the fourth decimal place may thus be taken as correct.

Significance Test Results and the Appendix Tables We employ classical null hypothesis testing, in which the probability of the sample result, P , is compared to a prespecified significance criterion a . If P < (is Jess than) tx. the null hypothesis (usually that analogous population value is zero) is rejected, and the sample result is deemed statistically “ significant” at the a level. In the tests we predominantly use (F and t), the actual value of P is not determined.6 Instead, the F or t value for the sample result is computed by the appropriate formula, and the result is compared with the value of F or / at the u criterion value round from a table in the Appendix. Then, if the sample value exceeds the criterion value, we conclude that P < a, and the null hypothesis is

rejected. We make provision in the Appendix Tables o f F and l, and in those used for statistical power analysis, for the significance criteria a = .01 and a = .05. We see no serious need in routine work for other a values. The a = .05 criterion is so widely used as a standard in the behavioral sciences that it has come to be understood to govern when a result is said to be statistically significant in the absence of a specified a value. The more stringent a = .01 criterion is used by some investigators routinely as a matter of taste or of tradition in their research '’That is, not by us in this book. Most computer programs computc and prim out the actual P for each /■'or t given (see Appendix 3).

1.2 ORIENTATION

21

area, by others selectively when they believe the higher standard is required for substantive or structural reasons. W e are inclined to recommend its use in re­ search involving many variables and hcnce many hypothesis tests as a control on the incidence of spuriously significant results. The choicc of a also depends importantly on considerations of statistical power (the probability of rejecting the null hypothesis), which is discussed in several places and in most detail in Section 4.5. In reporting the results of significance tests for the many worked examples, we follow the general practice of attaching double asterisks to an F or : value to signify that P < .01, and a single asterisk to signify that P < .05 (but not .01). No asterisk means that the F or ?is not significant, that is, P cxcccds .05. The statistical tables in the Appendix were largely abridged from Owen (1962) and from Cohen (1977), with some values computed by us. The entry values were selected so as to be optimally useful over a wide range of M R C applica­ tions. For example, wc provide for many values of numerator degrees of freedom (numbers of independent variables) in the F and L tables, and similarly for denominator (error) degrees of freedom in the F and t tables and for n in the power tables for r. On the other hand, we do not cover very low values for n, since they are almost never used. The coverage is sufficiently dense to preclude the need for interpolation in most problems; where needed, linear interpolation is sufficiently accurate for almost all purposes. On very rare occasions more exten­ sive tables may be required, for which Owen (1962) is recommended. 1.2.3 The Spectrum of Behavioral Science When wc address behavioral scientists, we are faced with an exceedingly hetero­ geneous audience, indeed. W c note in passing that our intended audience ranges from student to experienced investigator, and from possession of modest to fairly advanced knowledge of statistical methods. With this in mind, wc assume a minimum background for the basic exposition of the M R C system, but at some later points and infrequently, wc must make some assumptions about background which may not hold for some of our readers, in order that wc may usefully address some others. Even then, we try hard to keep everyone on board. But it is with regard to substantive interests and investigative methods and materials that behavioral scientists are of truly mind-boggling diversity. The rubric “ behavioral scicnce” has no exactly delimited reference, but we use it broadly, so that it covers the “ social," “ human.” and even “ life” sciences, everything from the physiology of behavior to cultural anthropology, in both their “ pure” and “ applied” aspects. Were it not for the fact that the methodol­ ogy of science is inherently more general than its substance, a book of this kind would not be possible. However, our target audience is made up, not of meth­ odologists, but of people whose primary interests lie in a bewildering variety of fields. W e have sought to accommodate to this diversity, even to capitalize upon it. Our illustrative examples are drawn from different areas, assuring the comfort of

22

1.

INTRODUCTION

familiarity for most of our readers at least some of the time. They have been composed with certain ideas in mind: their content is at a level which makes them intellectually accessible to nonspecialists, and they arc all fictitious, so they can accomplish their illustrative purposes efficiently and without the distractions and demands for specialized knowledge which would characterize real data. We try to use the discussion of the examples in a way which may promote some crossfertilization between fields of inquiry— when discussed nontechnical Iy, some problems in a given field turn out to be freshly illuminated by concepts and approaches from other fields. W e may even contribute to breaking down some of the methodological stereotypy to be found in some areas, where data are ana­ lyzed traditionally, rather than optimally.

1.3 PLAN 1.3.1 Content The first part of this book (Chapters I through 4) develops the basic ideas and methods of multiple correlation and regression. Chapter 2 treats simple linear correlation for two variables, X and }', and the related linear regression model, with Y as a dependent variable and X as a single independent variable, in both their descriptive and inferential (statistical hypothesis testing and power analysis) aspects. In the first part of Chapter 3, the M R C model is extended to two independent variables, which introduces the important ideas of multiple and partial regression and correlation, and the distinction between simultaneous and hierarchical M RC . The relevance to causal models is shown. In the latter pan of Chapter 3, the conceptually straightforward generalization from two to k inde­ pendent variables is made. Up to this point, for the most part, the treatment is fairly conventional. Chapter 4, however, introduces a further generalization of M R C , wherein the indepen­ dent variables are organized into h sen, each made up of one or more variables. The utility of this extension arises from the fact that the research factors (pre­ sumed causes) being studied are expressed, in general, as sets, to which the ideas of partial relationship and of hierarchical versus simultaneous analyses are ap­ plied. Simple methods for hypothesis testing and statistical power analysis arc included. With this chapter, the basic structure of M R C as a general data-analytic system is complete. The stage having been set. Part II proceeds to detail in a series of chapters how information in any form may be represented as sets of independent variables, and how the resultant M R C yield for sets and their constituent variables is in­ terpreted. In Chapter 5, various methods arc described for representing nominal scales (for example, experimental treatment, diagnosis, religion), and the oppor­ tunity is grasped to show that the M R C results include those produced by A V . and more. Chapter 6 performs the same task for quantitative (ratio, interval, ordinal) scales, showing how various methods of representation may be used to

1.3 PLAN

23

determine the presence and form of curvilinear relationship with Y. Chapter 7 is concerned with the representation of missing data, a ubiquitous problem in the behavioral sciences, and shows how this property of a research factor may be used as positive information. Chapter 8 presents and generalizes the idea of interactions as conditional relationships, and shows how interactions among research factors of any type may be simply represented as sets, incorporated in analyses, and interpreted. In Part III { “ Applications” ), Chapter 9 presents an introductory treatment of causal models. Utilizing causal diagrams and the basic M R C methods detailed in the earlier chapters, it provides working methods for the analysis of recursive causal models, and points the direction in which more complex models involving reciprocal causality may be analyzed. Chapter 10 provides a culmination of the ideas about multiple sets (Chapter 4), implemented by the methods of representing research factors (Chapters 5 through 7) and their interactions (Chapter 8). It shows how conventional A C V may be accomplished by M R C , and then greatly generalizes A C V , first to accommodate multiple, nonlinear, and nominal covariate sets, and then to extend to quantita­ tive research factors. The nature of partialling is closely scrutinized, and the prohlem of fallible (unreliable) covariatcs is addressed, as is the use of the system in the study of change. Chapter 11 extends M R C analysis to repeated measurement and matchedsubject research designs. In Chapter 12, canonical correlation analysis and other multivariate methods are surveyed and related to M R C analysis, from which some novel analytic methods emerge. In the Appendices, we provide the mathematical background for M R C (Ap­ pendix 1), the hand computation for k variables together with the interpretation of the results of matrix inversion (Appendix 2), and a discussion of the use of available computer programs to accomplish M R C analyses (Appendix 3). Ap­ pendix 4 presents a new data-analytic method called set correlation. This is a multivariate method which generalizes M R C to include sets (or partialled sets) of dependent variables and in so doing, generalizes multivariate methods and yields novel data-analytic forms. Finally, the necessary statistical tables are provided. For a more detailed synopsis of the book's contents, the reader is referred to the summaries at the ends of the chapters. 1.3.2 Structure: Numbering of Sections, Tables, and Equations Each chapter is divided into major sections, identified by the chapter and scction numbers, for example. Section 5.3 ( “ Dummy-Variable Coding” ) is the third major section of Chapter 5. The next lower order of division within each major section is further suffixed by its number, for example, Section 5.3.4 ( “ DummyVariable Multiple Regression/Correlation and Analysis of Variance” ) is the fourth subsection of Section 5.3. Further subdivisions arc not numbered, but titled with an italicized heading.

24

1.

INTRODUCTION

Tables, figures, and equations within the body of the text are numbered con­ secutively within major sections. Thus, for example, Table 5.3.4 is the fourth tabic in Section 5.3, and Eq. (5.3.4) is the fourth equation in Section 5.3. (Wc follow the usual convention of giving equation numbers in parenthesis.) A simi­ lar plan is followed in the four Appendices. The reference statistical tables make up a separate appendix and are designated by letters as Appendix Tables A through F.

1.4 SUMMARY This introductory chapter begins with an overview of M R C as a data-anaiytic system, emphasizing its generality and superordinate relationship to the analysis of variance/covariance. Multiple regression/correlation is shown to be peculiarly appropriate for the behavioral sciences in its capacity to accommodate the vari­ ous types of complexity which characterize them: the multiplicity and correlation among causal influences, the varieties of form of information and shape of relationship, and the frequent incidence of conditional (interactive) relationships. The special relevance of M R C to the formal analysis of causal models is de­ scribed. (Section 1.1) The book’s exposition of M R C is nonmathematical, and stresses informed application to scientific and technological problems in the behavioral sciences. Its orientation is “ data-analytic” rather than statisticai-analytic, an important distinction that is discussed. Concrete illustrative examples arc heavily relied upon. The means of coping with the computational demands of M R C by desk calculator and computer are briefly described and the details largely relegated to appendices so as not to distract the reader’s attention from the conceptual issues. The ground rules for reporting numerical results (including a warning about rounding discrepancies) and those of significance tests arc given, and the statisti­ cal tables in the appendix are described. Finally, we acknowledge the hetero­ geneity of background and substantive interests of our intended audience, and discuss how wc try to accommodate to it and even exploit it to pedagogical advantage. (Section 1.2) The chapter ends with a brief outline of the book, and the scheme by which sections, tables, and equations arc numbered. (Section 1.3)

2 Bivariate Correlation and Regression

One of the most general meanings o f the concept o f a relationship between pairs of variables is that knowledge with regard to one o f the variables carries informa­ tion about the other variable. Thus, information about the height o f a child in elementary school would have implications for the probable age of the child, and information about the occupation of an adult would lead to more accurate guesses about his income level than could be made in the absence of that information.

2.1

TABULAR A N D GRAPHIC REPRESENTATIONS OF RELATIONSHIPS

Whenever data has been gathered on two quantitative variables for a set of units (for example, individuals), the relationship between the variables may be dis­ played graphically by means of a scatter plot. For example, suppose we have scores on a vocabulary test and a digit-symbol substitution task for i5 children (see Table 2 .1 .1). I f these data are plotted by representing each child as a point on a graph with vocabulary scorcs on the horizontal axis and the number o f digit symbols on the vertical axis, we would obtain the scatter plot seen in Fig. 2.1.1. The circled dot, for example, represents Child I . who obtained a score o f 5 on the vocabulary test and completed 12 digitsymbol substitutions. When we inspect this plot, it becomes apparent that the children with higher vocabulary scores tended to complete more digit symbols (d-s) and those low on vocabulary (v ) scores were usually low on d-s as well. This can be seen by looking at the average of the d-s scores corresponding to each v score, Yv. The child receiving the lowest score, 5, received a d-s score of 12; the children with the next lowest v score, 6, obtained an average d-s score of 14.67, and so on to the highest v scorers, who obtained an average of 19.5 on the d-s test. A parallel tendency for vocabulary scores to increase is observed for increases in d-s scores.

25

T A B L E 2.1.1 Illustrative Set of Data on Vocabulary and Digit-Symbol Tests

12

Child (no.)

Vocabulary

Digit symbol

1 2 3 4 5 6 7 8 9 10 n 12 13 14 15

5 8 7 9 10 8 6 6 10 9 7 7 9 6 8

12 15 14 18 19 18 14 17 20 17 15 16 16 13 16

14.67

15

16.33

(7

19 5

*d-s

(0

-I (0 I 85

18

7 5

d -s = y 18

15

7,5

14

65

13

12

_L 10

\ = X

FIGURE 2.1.1

26

A strong, positive linear relationship.

2.1 GRAPHIC REPRESENTATION OF RELATIONSHIPS

27

The form of this relationship is said to he positive, because high values on one variable tend lo go with high values on the other variable and low with low values. It may also be called linear because the tendency for a unit increase in one variable to be accompanied by a constant increase in the other variable is constant throughout the scales. That is, if we were to draw the straight line which best fits the average of the d-s values at cach v score (from the lower left-hand corner to the upper right-hand corncr) we would be describing the trend or shape of the relationship quite well. Figure 2 . 1.2 displays a similar scatter plot for age and the number of seconds needed lo complete the digit-symbol task. In this case, low scores on age tend to go with high lest time in seconds, and low test times are more common in older children. In this case the relationship may be said to be negative and linear. It should also be clear at this point that whether a relationship between two vari­ ables is positive or negative is a direct conscquence of the direction in which the two variables have been scored. If, for example, the vocabulary scores from the first example were taken from a 12-item test, and instead of scoring the number correct a count was made of the number wrong, the relationship with d-s scores would be negative. Because such scoring decisions in many cases may be essen­ tially arbitrary, it should be kept in mind that any positive relationship becomes negative when either (but not both) of the variables is reversed, and vice versa. Thus, for example, a negative relationship between age of oldest child and

F IG U R E 2.1.2

A negative linear relationship.

2.

B IV A R IA T E C O R R E LA T IO N A N D R E G R E S S IO N

20

19 18 17

d-s

16

15 14 13

12 J ___________ L

1

2

__________ I_____________I__

3

^

Mol i v a f;o r A curvilinear relationship

Number of errors

FIGURE 2,1,3

F IG U R E 2.1.4

A negative, curved line relationship.

2.1 GRAPHIC R EPRESEN TA TIO N OF R ELA T IO N SH IPS

29

income for a group of 30-year-old mothers implies a positive relationship be­ tween age of first becoming a mother and incom e.1 Figure 2. i .3 gives the plot of a measure of motivational level and score on a difficult d-s task. It is apparent that the w ay motivation is associated with perfor­ mance score depends on whether the motivational level is at the lower end of its scale or near the upper end. Thus, the relationship between these variables is curvilinear. Fin ally, Fig. 2.1.4 presents a scatter plot for age and number of substitution errors. This plot demonstrates a genera! tendency for higher scores on age to go with fewer errors, indicating that there is, in pan, a negative linear relationship. How ever, it also shows that the decrease in error rate that goes with a unit increase in age is greater at the lower end of the age scale than it is at the upper end, a finding which indicates that although a straight line provides some kind of fit, clearly it is not optimal. Thus, scatter plots allow visual inspection of the form of the relationship between two variables. These relationships may be linear (negative or positive) or curvilinear; or they may be w ell described by a straight line, approximated by a straight line, or may require lines with one or more curves to adequately describe them. Because linear relationships are very common in all sorts o f data, we shall concentrate on these in the current discussion, presenting methods of analyzing nonlinear relationships in Chapter 6. N ow suppose that Fig. 2.1.1 is compared with Fig. 2 .L 5 . In both eases the relationship between the variables is linear and positive; however, it would appear that vocabulary provides better information with regard to d-s completion than docs chronological age. That is, the degree of the relationship with perfor­ mance seems to be greater for v than for age because one could make more accurate estimates of d-s scores using information about v than using age. In order to compare these two relationships to determine which is greater, we need an index of the degree or strength o f the relationship between two variables that will be comparable from one pair of variables to another. Looking at the relation­ ship between v and d-s scores, other questions come to mind: Should this be considered a strong or weak association? On the whole, how great an increase in digit-symbol score is found for a given increase in vocabulary score in this group? If v is estimated for d-s in such a way as to minimize the errors, how much error w ill, nevertheless, be made? I f this is a random sample of subjects from a larger population, how much confidence can we have that v and d-s are linearly related in the entire population? These and other questions arc answered by correlation and regression methods and their associated tests of significance. In the use and interpretation of these methods the two variables are literally treated as interval scales, that is, constant differences between scale points on

'Here we follow (he convention of naming a variable tor the upper end of ihe scale. Thus, a variable callcd income means that high numbers indicate high income, whereas a variable called poverty would mean that high numbers indicate much poverty and therefore low income.

30

2.

BIVARIATE CORRELATION AND R EG R ESSIO N

20 19 IB

*7

6 -s

16

15 (Z

1 5.0

6.0

6.5

7.0

Age

F IG U R E 2.1.5

A weak, positive linear relationship.

each variable are assumed to represent equal “ amounts” o f the construct hcing measured. Although for many or even most psychological scales this assumption is not literally true, empirical work (Baker, Hardyck, & Petrinovich. 1966) indicates that small to moderate inequalities in interval size produce little if any distortion in the validity o f conclusions based on the analysis. Further discussion of this issue is found in Chapter 6.

2.2

THE IN D EX OF LINEAR CORRELATION BETW EEN TW O VARIABLES: THE PEARSON PRODUCT M O M E N T r

2.2.1 Standard Scores: M aking Units Com parable One of the first problems solved by an index of the degree of association between two variables is that o f measurement unit. Because the two variables are typically expressed in different units, we need some means of converting the scores to comparable measurement units. It can be readily perceived that any index that would change with an arbitrary change in measurement unit— from inches to centimeters or age in months to age in weeks, for example— could hardly be useful as a general description of the strength of the relationship between height and age. one which could be compared with other such indices. To illustrate this problem, suppose information has been gathered on the an­ nual income and the number o f major household appliances of four households

2.2 THE PEA R SO N PRODUCT M O M EN T r

31

TABLE 2.2.1 Data on Income and Major Household Appliances Presented in Original Units, Deviation Units, and z Units X

Y Appliances

Household

Income

I 2

14,000 19,000

3 7

-3,500 * 1,500

3 4

17,000 20,000

4 5

-500 +2,500

Sum

70,000

19

0

X 17,500 I x 2!n - 5.250,000 2,29129 i( ix - V i x 2h i

m sd\. -

I 2 3 4

Y-

Y = v

j.’

>'2

—1 75

12.250,000 3.0625

-‘-2.25 -.75 + .25

2,250,000 5.0625

0

250,000 6.250.000

.5625 .0625

21,000,000 8.75

4.75 - Y 2.1875 = sd], 1.4790 = sdy

x!sdx = zx

y!sdr - zy

rj

1.53 t 65 -.22 +1.09

-118 + 1.52

2.333 .429

-.51 + 17

.048 1.190 4 000

Sum

0

0

m sd*

0 1.00 1.00

0

S(1

X - X = x

4 1.4000 2.3143 .2571 .0286 4.000

1.00 1.00

(Table 2 .2 .1).2 In the effort to measure the degree of relationship between incomc ( X ) and the number o f appliances (V), wc are embarrassed by the d if­ ference in the nature and size of the units in which the two variables are mea­ sured. Although Households I and 3 are both below the mean on both variables and Households 2 and 4 are above the mean on both (see x and y, scores expressed as deviations from their means, symbolized as X and Y, respectively), wc are stil! at a loss to assess the correspondence between a difference of $3500 from the mean income and a difference of 1.5 appliances from the mean number o f appliances. W e may attempt to resolve the difference in units by ranking the households on the two variables— l, 3, 2, 4 and I , 4, 2, 3, respectively— and noting that there seems to be some correspondence between the two ranks. In so doing we have, however, make the difference between Households I and 3 ($3000) equal to the difference between Households 2 and 4 ( S I 000); two ranks in each ease. 2!ti this example as in all examples thai follow, the number of cases («) is kepi very small m order to facilitate the reader's following of the computations. In almost any serious research, the « must, of course, be vers much larger (Section 2 9).

32

2.

BIVARIATE CORRELATION AND R EG R ESSIO N

To make the scores comparable we clearly need some way o f taking the different variability o f the two original sets of scores into account. Because the standard deviation (sd) is an index of variability o f scores, we may measure the discrepancy o f each score from its mean (in) relative to the variability o f all the scores by dividing by the sd:

( 2 .2 . 1 )

where Z * 2 means “ the sum of the squared deviations from the mean.” 3 The scores thus created are in standard deviation units and are called standard or

2

scores.

X -X (

2. 2 . 2 )

sdx

x sdx

In Table 2.2.1, the z score for income for Household 1 is -1.53, which indicates that its value (534,000) falls about U income standard deviations (S2.291) below the income mean (517,500). Although income statistics are expressed in dollar units, the z score is a pure number, that is, it is unit-free. Sim ilarly, Household 1 has a z score for number of appliances of - 1.18. which indicates that its number of appliances (1 set) is about U standard deviations

below the mean number o f appliances (4.75). Note again that — 1.18 is not expressed in number of appliances, but is also a pure number. Instead of having to compare $14,000 and I appliance for Household 1, we can now make a meaningful comparison of — 1.53 ( zx ) and — 1.18 (zy), and note incidentally the similarity o f the two values for Household 1. It should be noted that the rank of the z scores is the same as that o f the original scores, and that scores that were above or below the mean on the original variable retain this characteristic in their z scorcs. In addition, we note that the difference between the incomes of Households 2 and 3 (X2 — X^ = $2000) is twice as large, and o f opposite direction to the difference between Households 2 and 4 (X2 — X 4 — —$1000). W hen we look at the 2 scores for these same households, we find that zX2 — zx

= .65 - ( - .22) = .87 is twice as large and zX4 — .65 - 1.09 = —.44

o f opposite direction to the difference zXy -

(.87/—.44 = - 2 , within rounding error). Such proportionality of differences or distances between scorcs 'Note I ha! wc distinguish throughout between sd, which is a descriptor of the variability of the sample at hand and uses ti as a divisor, and sd, winch is a (sample-based) estimator of the population variability and uses degrees of freedom as a divisor The latter will be required for testing hypotheses and finding confidence limits. Also note that the summation sign. 1 , is used to indicate summation over all « cases here and elsewhere, unless otherwise specified.

2.2 THE PEA RSO N PRO D UCT M O M EN T r

X, ~ Xj

(2.2.3)

33

zx . ~ zx. P

Zv

is the essentia] element in what is meant by retaining the original relationship between the scores. This can be seen more concretely in Fig. 2.2.1, in which we have plotted the pairs of scores. Whether we plot z scores or raw scorcs, the points in the scatter plot have the same relationship to each other. The z transformation of scores is one example of a linear transformation. A linear transformation is one in which every score is changed by multiplying or dividing by a constant and adding or subtracting a constant. Changes from inchcs to centimeters, dollars to francs, and Fahrenheit to Celsius degrees arc examples of isnear transformations. Such transformations w iil, o f course, change the ms Wo. of Appliances

+ 1.52

7 6

+ .17

5

- .5 1

4

"1 .1 8

3 2

14,000 ] 16,000 I 18 0 0 0 | 2 0 ,000 15,000 17,000 19,000 -1 .5 3

-.22

+ .6 5

+1.09

Income FIG URE 2.2.1

Bivariate distribution of income and major appliance ownership

34

2.

BIVARIATE CORRELATION AND R EG R ESSIO N

and sd s o f the variables upon which they arc performed. However, bccause the sd w ill change by exactly the same factor as the original scores (that is, by the constant by which scores have been multiplied or divided) and because

2

scores

arc created by subtracting scores from their mean, all linear transformations o f

scores will yield the same set o f z scores. (If the multiplier is negative, the signs of the z scores w ill simply be reversed.) Becausc the properties o f

2

scores form the foundation necessary for under­

standing the correlation coefficient they w ill be briefly reviewed: 1. The sum o f a set o f z scores ( X z) and therefore aiso the mean equal 0. 2. The variance {sd2) of the set o f z scores equals I , as does the standard deviation (sd). 3. Neither the shape o f the distribution o f X nor of its absolute correlation with any other variable is affected by transforming it to zx (or any other linear transformation). 2.2.2 The Product M o m e n t r as a Function of Differences b etw ee n z Scores W c may now define a perfect (positive) relationship between two variables (X and Y) as existing wben ail zx , z r pairs o f scores consist of two exactly equal values. Furthermore, the degree o f relationship will be a function of the depar­ ture from this “ perfect” state, that is, a function of the differences between pairs of zx and Zy scores. Because the average difference between paired zx and zy is necessarily zero (hecausc zY = zx = 0), the relationship may be indexed by finding the average of the squared discrepancies between z scores ( ^ ( 2* ~ z,,)2ln). For example, suppose that an investigator o f academic life obtained the (ficti­ tious) data shown in Table 2.2 .2 . The subjects were 15 randomly selected mem­ bers of a given department, and the data include the number of years that had elapsed since the faculty member’s Ph.D . was awarded and the number of publications in refereed professional journals. Several things should be noted in this tahle. Deviation scores (x and v) sum to 2 ero

as do zx and zr .

and sd ,Y arc both I , :x and z y are both 0 (which arc

mathematical necessities), and these equalities reflect the equal footing on which w c have placed the two variables. W c find that the squared differences between z scores sum to 9.50, which when divided by the number of paired observations equals .633. How large is this relationship? W c have stated that if the two variables were perfectly (positively) related, all z score differences would equal zero and necessarily their sum and mean would also be zero. A perfect negative relationship, on the other hand, may be defined as one in which the z scores in each pair are equal in absolute value but opposite in sign. Under the latter circumstances, it is demon­ strable that the average of the squared discrepancies always equals 4. It can also be proved that under circumstances in which the pairs of z scores arc on the

2.2 THE PEARSON PRODUCT MOMENT /•

35

TABLE 2.2.2 Calculation of z Scores, z Score Differences, and z Score Products on Data Example

Case 1 2 3 4 5 6 7 8 9 !0 11 12 13 14 15 Sum SS“ m StP sd

X Years Ph.D. 1 2 5 7 10 4 3 8 4 16 15 19 8 S4 28 144 2170 9.60 52.51 7.25

Y No, of Pubt. 2 4 5 12 5 9 3 1 8 12 9 4 8 1) 21 114 1236 7.60 24.64 4.96

X - X sdx

Y- Y sdr

VV')'

t. 13 -.73 -.52 .89 - 52 .28 -.93 - 1.33 .08 .89 .28 - .73 08 69 2.70 0 15

1.19 -1.05 - .63 - .36 06 -.77 - 91 - 22 -.77 .88 .74 1.30 -.22 61 2.54 0 15

0 1 1

0 1 I

.06 .32 - 11 -1.25 .58 -1 05 .02 1.11 .85 -.01 .46 2 03 ,30 .08 - 16 0 9.50 0

1.34 .77 .33 - .32 - 03 -.22 .85 .29 - 06 .78 .21 - 95 -.02 42 6 86 10.25

683

"Sum of squared values, that is, £ X -. X Y1. etc.

average equally likely to be consistent with a negative relationship as with a positive relationship, the average squared difference will always equal 2 , which is midway between 0 and 4. Under these circumstances, we may say that there is no linear relationship between X and Y. Although it is clear that this index, ranging from 0 for a perfcct positive linear relationship through 2 for no linear relationship to 4 for a perfect negative one, does reflect the relationship between the variables in an intuitively meaningful way, it is useful to transform the scale linearly to make its interpretation even more clear. If we divide the average squared discrepancies by 2 and subtract the result from I , we have

(2.2.4)

/- = I - 1 ( ~

which for the data of Table 2.2.2 gives

{2 > i

~

.

36

2.

BIVARIATE CO RRELATION AND R EG R ESS IO N

r is the product moment correlation coefficient, invented by Karl Pearson in 1895. This coefficient is the standard measure o f the linear relationship between two variables and has the following properties: 1 It is a pure number and independent of the units of measurement. 2. Its absolute value varies between zero, when the variables have no linear relationship, and 1, when cach variable is pcrfcctly prcdictcd by the other. The absolute value thus gives the degree of relationship. 3. Its sign indicates the direction of the relationship. A positive sign indicates a tendency for high values o f one variable to occur with high values o f the other, and low values to occur with low. A negative sign indicates a tendency for high values of one variable to be associated with low values o f the other. Reversing the direction of measurement of one of the variables w ill produce a coefficient of the same absolute value but of opposite sign. Coefficients of equal value but opposite sign (e.g., + .50 and - .50) thus indicate equally strong linear relation­ ships, but in opposite directions.

2.3 ALTERNA TIVE FORM ULAS FOR THE PRODUCT M O M E N T r The formula given in Eq. (2.2.4) for the product moment correlation coefficient as a function of squared differences between paired z scores is only one of a number of mathematically equivalent formulas. Some of the other version pro­ vide additional insight into the nature of r, and others facilitate computation. Yet other formulas apply to particular kinds o f variables, such as variables for which only two values are possible, or variables that consist of a set of ranks. 2.3.1 r as th e A verag e Product of z Scores It follows from simple algebraic manipulation o f b-q. (2.2.4) that ( 2 .3 . 1 )

r = ^Ix£y. ft

The product moment correlation is therefore seen to be the mean o f the prod­ ucts o f the paired z scores. In the case o f a perfect positive correlation, because zx = ZY'

r = ~ ?>'z> = ^ z" = i n

n

For the data presented in Table 2.2., these products have been computed and r = 10.25)/15 = .683, as before. 2.3.2 R aw Score Form ulas fo r r Because z scores can be readily reconverted to the original units, a formula for the correlation coefficient can be written in raw score terms. There arc many

2.3 A LTERN ATIVE FO R M U LA S FOR r

37

mathematically equivalent versions o f this formula of which the following is a convenient one for computation by computer or calculator: (2 .3 .2 }

r =

n 1XY - I X l Y V [ n I X 2 - ' ( £ X ) 2|[n S P 2 - ( ^ y ) 2j '

W hen the numerator and denominator are divided by n2. Eq. (2.3.2) becomes an expression for r in terms of the means of each variable, of each squared variable and of the X Y product: (2.3.3)

r=

-

v(m x

~ m^lKtn y ~ m Y>

It is useful for hand computation to recognize that the denominator is the product of the variables’ standard deviations, thus

sdx sd Y The numerator of Hq. (2.3.4) can also be rewritten instructively;

(2.3.5)

r -

sdKs d Y

.

The numerator, the average of the products of the deviation scorcs, is callcd the covariance and is an index o f the tendency for the two variables to covary or go together, but one that is expressed in the original units in which X and Y are measured (e.g ., income in dollars and number o f appliances). Thus, we can see that r is an expression of the covariance between standardized variables, because if we replace the deviation scorcs with standardized scorcs, Eq. (2.3.5) reduces to Eq. (2.3.1). It should be noted that r inherently is not a function o f the number o f observa­ tions4 and that the n in the various formulas serves only to cancel it out of other terms w'here it is hidden (for example, sd). B y multiplying Eq. (2.3.6) by n it can be completely cancelled out to produce a formula for r that does not contain any vestige of 11:

f2 '3

6)

r =

x2 Z

'

2.3.3 Point Biserial r When one of the variables to be correlated is a dichotomy (it can take on only two values), the computation of r simplifies. There are many dichotomous variables in the social sciences, such as yes or no responses, left- or right-handedness, and the presence or absence of any trait or attribute. For example, although the variable "se x o f subject” does not seem to be a quantitative variable it may be J l;or n > 2. When n = 2. r must equal

1 or —I

38

2.

BIVARIATE CORRELATION AND REG RESSIO N

looked on as the presence or absence of the characteristic of being female (or male). As such, wc may decide, arbitrarily, to score all females as 1 and all males as 0. Under these eircumstanccs, the sd of the sex variable is determined by the proportion of the total n in each of the two groups: sd = \f p q , where p is the proportion in one group and q = 1 — p, the proportion in the other group. Because r indicates a relationship between two standardized variables, it docs not matter whether wc choosc 0 and 1 as the two values or any other pair of different values, because any pair will yield the same absolute z scores. For example, Table 2.3 .1 presents data on the effects of an interfering stimulus on task performance for a group of seven experimental subjects. As can be seen, the absolute value of the correlation remains the same whether we choose ( X A) 0 and 1 as the values to represent the absence or presence of an interfering stimulus or choosc (Xl{) 50 and 20 as the values to represent the same dichotomy. The sign of r, however, depends on whether the group with the higher mean on the other (K ) variable, in this case the no-stimulus group, has been assigned the higher or lower of the two values. The reader is invited to try other values and observe the constancy of r. Bccause the z scores of a dichotomy arc a function of the proportion of the total in each of the two groups, the product moment correlation formula simplifies to

rpb _ ( Y l -_Y 0) y /m

(2.3.5)

where Y, and Y0 are the Y means of the two groups of the dichotomy. The simplified formula is called the point biserial r to take note of the fact that it TABLE 2.3.1 A n E x a m p le o f C o r r e la tio n b e tw e e n a D ic tio t o m o u s a n d a C o n tin u o u s V a ria b le

Subjcct 110.

Stimulus condition

Task scorc

(X)

(10

n o n l; NONF. NONE NONK ST1M STIM ST1M

1 2

3 4 5 6

7 Sum

m Y Y sd

NONE STIM

XA

Zy

*/»

AI 1.63 .81 .41 -.81 -1.63

- .8 8

.88

- .8 8

.88

- .8 8

.88

67 72 70 69

0

0

SO 50 50 50

66

1

20

64

1

20

68

1

20

0

1.16 1.16 1.16

476

3

260

0

0

0

68

.43

37.14

0

0

0

.495

14.9

0 0

- .8 8

.88

-1.16 •1.16 -1.16

69 5 66.0

2 45

zyzfl

ZB

rY;1 =-.707 >YB = .707

.36 -1.43 -.71 .36 -.94 ••1.89

- .36 1.43 .71 36 .94 1.89

0

0

■4.97

4.97

-.707

.707

2.3 A LTERN ATIVE FO R M U LA S FOR r

39

involves one variable (X ) whose values arc all at one of !wo points and one continuous variable ( Y). In the present example (66.0 - 69.5) v/(.428X-572)

_

r„b = ------------------------ — = -.707.

P

2.45

The point biserial formula for the product moment correlation displays an interesting and useful properly. Under the circumstances in which the two groups on the dichotomous variable are of equal size (p = q = .5, so \^ p q = .5), the r then equals half the difference between the means o f the z scores for Y, and so 2r equals the difference between the means on the standardized variable. 2.3.4 Phi () Coefficient When both X and Y arc dichotomous, the computation o f the product moment correlation is even further simplified. The data may be represented by a fourfold table and the correlation computed directly from the frequencies and marginals. For example, suppose a study investigates the difference in preference of home­ owners and nonhomeowners for the two candidates in a local election, and the data is presented in Tahle 2.3.2. The formula for r here simplifies to the dif­ ference between the product of the diagonals o f a fourfold table o f frequences divided by the square root of the product o f the four marginal sums:

BC-AD r* ~ V(A +B)(C+Dj(A +OCB+£) = ( 54) ( 6 0 )- Q 9 X 5 2 )

\fl~i

-112

• 79 -T06

= .272, Oncc again it may be noted that this is a computing alternative to the z score formula, and therefore it does not matter what two values are assigned to the dichotomy, because the standard scores, and hence the r, w ill remain the same at least in absolute value. It also follows that unless the division of the group is the

T A B L E 2.3.2 Fourfold Frequencies for Candidate Preference and Homeowning Status Candidate I) A Homeowners

54

13 = A + B

52

112 = C + D

D 60

106 = B + D

n

19 = A * C

00

Total:

Total:

B 19

C Nonhomeowners

Candidate V

40

2.

BIVARIATE CORRELATION AND R EG R ESSIO N

same for the two dichotomies ( p Y = px or qx ), their z scores cannot have the same values and r cannot equal 1 or — I . A further discussion o f this limit is found in Section 2.11.1. 2.3.5 Rank Correlation Yet another simplification in the product moment correlation formula occurs when the data being correlated consist of two sets o f ranks. Because the sd of a complete set o f ranks is a function only of the number of objects being ranked, some algebraic manipulation yields (2.3.9) where Table

rranks = I “

d isthe

difference inthe

6 2 d2 n ( n 2 - I)

ranks o f thepair foranobject or individual.In

2.3.3a set of 5 ranks is presented with theirdeviations and differences.

Using one o f the general formulas (2.3.6) for r, X xy

-

v =

D

v T P

~3 V To V lo

= -.300. The rank order formula (2.3.9) with far less computation yields 6

'=

1

I d 2

- 7 ^ - r ij

= 1 - 6(26) 5(24) 156 120

= —.300, which checks. W c wish to stress the fact that the formulas for point biserial, phi, and rank correlation arc simply computational equivalents of the previously given general formulas for r that result from the mathematical simplicity o f dichotomous or rank data. They arc o f use when computation is done by hand or desk calculator. They are o f no significance when computers arc used, because whatever formula for r the computer uses w ill work when variables arc scored

0-1

(or any other

two values) or are ranked. It is obviously not worth the trouble to write special programs to produce these special-case versions o f f when a formula such as Kq. (2.3.2) w ill producc them.

2.4 REGRESSIO N COEFFICIENTS: ESTIM ATING Y

41

TABLE 2.3.3

Correlation between Two Sets of Ranks

Y

4 2

2

1

-i

1

2

4

1

-1

1

-2

2

1

1

3 4 5

3 5

A

0

0

1

4 I

0

-1

1

2

0

0

2

4

-2

4 4

0

1

3 5

2

4

-4

-4

16

Sum

15

15

0

10

0

10

3

0

26

m

3

3

1 2

xy

rf3

X

i.D.

X

1

-1

d

2.4 REGRESSION COEFFICIENTS: ESTIMATING Y FROM X Thus far we have treated the two variables as if they were of equal status. It is, however, often the case that variables are treated asymmetrically, one variable being thought of as the dependent variable or criterion, and the other being thought of as the independent variable or predictor. These labels reflect two of the most common reasons why the relationship between two variables may be under investigation. The first and primary scicntific question looks upon one variable as dependent on the other, that is, as in pari an effccl of or influenced by the other. The second question is technological in which simple prediction is the goal, as for example, when high school grades arc used to predict college grades with no implication that the latter are actually caused by the former. In cither case the measure of this cffect w ill, in general, he expressed as amount o f change in the Y variable per unit change in the X variable. To return to our academic example, we wish to obtain a single number that summarizes the average amount of change in publications per year since Ph.D. To find this number, we w ill need some preliminaries. Obviously, if the relation­ ship between publications and years were pcrfccl and positive, we could provide the number of publications corresponding to any given number of years since Ph.D. simply by adjusting for differences in scale of the two variables. Because, when rxy = 1 , for any individual j. the estimated i Yj simply equals zx -, then

Y j-Y X j-X sdy and solving for / s estimated value.

sdx

42

2.

BIVARIATE CORRELATION AND R EG R ESSIO N

Y. sdx , and sd y are known, it remains only to specify X- and then Yj may be computed. (W e use the “ hat” over the Y to signify an estimated

and because X, value.)

W hen, however, the relationship is not perfect, wc may nevertheless wish to show the

Y estimate

we would obtain by using the best possible “ average”

conversion or prediction rule from X in the sense that the computed values will be as dost' to the actual Y values as is possible with a single linear conversion formula. Larger differences between the actual and predicted scores {Yj — Y ) are indicative of larger errors. The average error I ( T ; — Yj)hi w ill equal zero when­ ever the overestimation of some scores is balanced by an equal underestimation of other scores. That there be no consistent over- or underestimation is a desir­ able property, but it may be accomplished by an infinite number of different conversion rules. W e therefore define us close as possible to correspond to the least-squares criterion so common in statistical work— we shall choose a conver­ sion rule such that not only arc the errors balanced (sum to zero), but also the sum o f the squared discrepancies between the actual Y and estimated Y w ill be minimized, that is. as small as the data permit. It can be proven via calculus {or some rather complicated algebra) that the linear conversion rule which is optimal for converting zx to an estimate of z Y is (2.4.1)

i Y -rzx .

If wc wish to convert from our original scores, bceause z Y = (Yj — Y)/sdy and zx = (X , - X)isdx .

(2.4.2)

Yj = r sdy — ------- — 1

Y

+

Y

*dx

It is useful to simplify and separate the elements of this formula in the follow­ ingway. Let

Sdy (2.4.3)

ByX

-------, sdx

and (2.4.4)

A yX = Y - B

yx

X,

from which wc may write the regression equation for estimating Y from X (dropping the j subscript from Y and X as understood) as (2.4.5)

Y = B y xX + Ayx-

This equation describes the regression o f Kon X. B yx is the regression coefficient for estimating Y from X , and represents the rate of change in Y units per X unit. A yx is called the regression constant or Y intercept and serves to make appropri­ ate adjustments for differences between X and Y. W hen the line representing the

2.4 REG RESSIO N COEFFICIENTS: ESTIM ATING Y

best linear estimation equation (the Y

011

43

X regression equation) is drawn on the

scatter plot of the data in original units, liYX indicates the slope of the line and represents the point at which the regression line crosses the Y axis, the estimated Y when X = 0. The slope of a regression line is the measure of its steepness, the ratio of how much it rises (or. when negative, falls) to any given amount of increase along the horizontal. Because the “ rise” over the “ run " is a constant for a straight line, our interpretation of it as the number of units of change in Y per unit changc in X meets this definition. Now we can deal with our example of 15 faculty members with a mean of 9.60 and sd of 7.25 years sincc Ph.D. and a mean of 7.60 and id o l 4.96 publications (Table 2.2.2). The correlation between years and publications was found to be .683, so

A vx = 7.60 - .468(9.60) = 3.11. The regression coefficient, B yx, indicatesthat (breach unit (year) of increase in X , we estimate a change of + .468 units (publications) in Y (i.e., almost half a publication per year), and that using this rule we will minimize our errors (in the least-squares sense). The A yx term gives us a point for stalling this estimation— the point for a zero value of X. which is, of course, out of the range for the present set of scores. The equation Y = B yxX + A yx produces, when X values are substituted, the YX line in Fig. 2.4.1. W e could, of course, estimate X from Y by interchanging X and Y in Kqs. (2.4.3) and (2.4.4). How-ever, the logic of regression analysis dictates that the variables are not of equal status, and estimating an independent or predictor variable from tbc dependent or criterion variable makes 110 sense. Suffice it to say that were we to do so the line estimating X from Y (the X on Y regression) would not be tbc same as the line estimating K from X (the K on X regression); neither its slope nor its intercept would be the same. Bccause we shall always treat Y as the dependent variable, ZJand A coefficients w ill, in general, be present­ ed without subscripts and may be understood to be R rx and A yx. The meaning of the regression coefficient may be seen quite well in the case in which the independent variable is a dichotomy. If we return to tbc example from fable 2.3.1 where the point biserial r •- —.707 and calculate

we note that this is exactly the difference between the two group means - 69.5. Calculating the intercept

A = 6 8 -(-3.5)(.428) = 69.5,

011

Y,

66

44

2.

BIVARIATE CORRELATION AND REG RESSIO N

wc find it to be equal to the mean of the group codcd 0 (the no-stimulus condition). This must be the case because the best (Icast-squares) estimate of Y for each group is its own mean, and the regression equation for the members of the group represented by the 0 point of the dichotomy is solved as

Publications

Y = B(0) + A = A .

X

Y

I

2 3.58

3

4.05

4

4.51

Zy -1.13 zy 1.19

-.7 3 -.9 3 1.05 -.91

X 1?

14 9.66

15 16 10.13 10.60

zy zx

.69 .61

.28 .74

.89 .8 8

4.98

5 5.45

.18 - .5 2 -.7 5 -.6 3

6



7 6.39

8

9

10

11 12

13

6.85 — 7 .79----- —

— .89 -.6 2 — -.52 — -.3 6 -.2 2 —

.06 23 . . . . 28 — 16.21

17 —

18 —

19 20 12.00 —

21 —

22 —

— —

— —

-.7 3 1.30

— —

— —

FIGURE 2.4.1

— —

— —

2.70 2.54

2.5 R EG R ESSIO N TO W A RD THE M EAN

45

2.5 REGRESSION TO W AR D THE M EAN A great deal o f confusion exists in the literature regarding the phenomenon of regression toward the mean. It is frequently implied or even asserted that this is an artifact attributable to regression as an analytic procedure. On the contrary, it is a mathematical necessity that whenever two variables correlate less than per­ fectly, cases that are at one extreme on one o f the variables w ill, on the average, be less extreme on the other. Although the number of cases in our running example is too small to show this phenomenon reliably at each data point, examination o f the zx and zY values in Table 2.2.2 w ill illustrate the point. If we take, for example, the five professors with 14-28 years since P h .D ., we find that their mean z score for years since Ph.D . is +1.21, whereas their mean z score for number o f publications is only + .77 { i. e ., about one-half sd closer to the mean). Sim ilarly, the five most recent P h .D .’s (1-4 years) have a mean z score for years since Ph.D . o f - .94 whereas their mean z score for number of publications is is —.49, again about one-half standard deviation closer to the mean. The specific numbers w ill, o f course, vary with how we define extreme, but the direction w ill remain the same: For extreme X, on the average, Y w ill be closer to the mean (i.e., less extreme). The same principle w ill hold in the other direction: Those who arc extreme on number o f publications w ill be less extreme on years since Ph.D . For example, those with fewer than five publications, with a mean z score for publications o f —.97, have a mean 2 score for years since Ph.D . of — .41, again a value distinctly closer to the mean. As can be seen from these or any other bivariate data that are not perfectly linearly related, this is in no sense an artifact, but a necessary corollary of less than perfect correlation. A further implication o f this regression phenomenon is evident when one examines the consequence of selecting extreme cases for study. In the proceeding paragraph, we saw that those with P h .D .’s only 1-4 years old had a mean for years since Ph .D . of —.94, but a mean

2

2

score

score for number o f publications of

— .49. An investigator might well be tempted to attribute the fact that these new Ph.D . ’s are so much closer to the mean on number o f publications than they are on years sincc Ph.D . to their motivation to catch up in the well-documented academic rat racc. How ever, recognition that a less than perfect correlation is a necessary and sufficient condition to produce the observed regression toward the mean makes it clear that any specific substantive interpretation is not justified. (There is a delicious irony here: the lower the correlation, the greater the degree o f regression toward the mean, and the more to “ interpret,” spuriously, of course.) Regression toward the mean thus produces an “ artifact” only in the sense that it may seduce an unsophisticated investigator into making an er­ roneous substantive interpretation of what is a mathematical and logical necessi­ ty. Unfortunately, the social science, educational, and medical literature is lit­ tered with examples o f this “ regression fa lla c y ." Because regression toward the mean always occurs in the presence o f a nonper­ fect linear relationship, it is observed when the variables consist o f the same

46

2.

BIVARIATE CO RRELATION AND R EG R ESSIO N

measure taken at two points in time. In this circumstance, the extreme cases at Time I w ill be less extreme at Tim e 2. I f the means and standard deviations arc stable, this inevitably means that low scores improve and high scores deteriorate. Thus, on the average over time, overweight people lose weight, low IQ children become brighter, and rich people become poorer. To ask why these examples of regression to the mean occur is equivalent to asking why correlations between time points for weight, IQ , and income arc not equal to + 1.(X). The necessity for regression toward the mean is not readily accessible to intuition but docs respond to a simple demonstration. Expressed in standard scores, the regression equation is simply i Y = r zx [Kq. (2.4.1)1. Because an r of + I or — 1 never occurs in practice, z Y w ill necessarily be absolutely smaller than because r is less than I . Concretely, when r = .40, whatever the value o f zx , zy must be only .4 as large. Although for a single individual the actual value of zy may be larger or smaller than 2 X, the expectcd or average value o f the z Y's that occur with zx , that is, the value of zY w ill be .4 o f the z Y value (i.e., it is

2 X,

“ regressed toward the mean” ). The equation holds not only for the expected value o f z Y for a single individual’s zx but also for the expected value of the mean

z y for the mean zx of a group of individuals. This is why, on the average, the fat grow skinnier, the dull brighter, the rich poorer, and vice versa.

2.6 ERROR OF ESTIM A TE A N D M EASU RES OF THE STRENG TH OF A SSO CIATIO N In applying the regression equation Y = BX + A, wc have of course only approximately matched the original Y values. How close is the correspondence between the information provided about Y by X ( i. e . , K)i and the actual Y values? Or, to put it differently, to what extent is Y associated with X as opposed to being independent of X ? How much do the values of Y, as they vary, coincide with their paired X values, as they vary? A s we have noted, variability is indexed in statistical work by the sd or its square, the variance. Because variances arc additive, whereas standard devia­ tions are not, it w ill be more convenient to work with s d 2. W hat wc wish to do is to partition the variance of y(.«/^) into a portion associated w ith X . which will be equal to the variance o f the estimated Y scores, s d 2-, and a remainder not associ-> . y. ated with X, s d Y — Y, the variance of the discrepancies between the actual and the estimated Y scores. (Those readers familiar with other analysis o f variance pro­ cedures may find themselves in a familiar framework here.) s d \ and s d 2 - - will 2 a y y *k sum to s d r , provided that Y and Y - Y arc uncorrelated. Intuitively it seems appropriate that they should be uncorrelated because Y is computed from X by the optimal rule. Nonzero correlation between Y and Y — Y would indicate correla­ tion between X (which completely determines Y) and Y - Y and would indicate that our original rule was not optimal. A simple algebraic proof confirms this intuition; therefore,

2 6 ERROR AND THE STRENGTH OF ASSOCIATION

(2 .6 .1 )

sdy

47

= s d \+ s d y _ y ,

and we have panitioned the variance of Y inlo a portion determined by X and a residual portion not linearly related to X. If no linear correlation exists between X and Y, the optimal rale has us ignore X and minimi/.e our errors of estimation by using Y as the best guess for every case. Thus we would be choosing that point about which the squared errors are a minimum and sdy — y = sdy. generally we may see that because |by Hq. (2.4.1|z>, = rzx , sd2

=

^ ( rzx^~

n and because

=

r i ^

More

= r 2.

n

= I, and

( 2 .6 .2 )

sd?Y =r2 + sd?y ~ i y ,

then r2 is the proportion of the variance of Y linearly associated with X, and 1 r2 is the proportion of the variance of Y not linearly associated with X. it is often helpful to visualize a relationship by representing cach circle. 'I'hc area enclosed by the circle represents its variance, and because we have variable as a standardized each variable to a variance of 1 , we will make the two circles of equal size (see Fig. 2.6.1). The degree of linear relationship between the two variables may be represented by the degree of overlap between the circles (the shaded area). Its proportion of either circle’s area equal r 2, and I - r- equals the area of the nonoverlapping part of either circle. Again, it is useful to note the equality of the variance of the variables once they are standardized: the size of the overlapping and nonoverlapping areas, /■2 and I - r2, respectively, must be the same for each. If one wishes to think in terms of the variance of the original X and Y, one may define the circles as representing 100% of the variance and the overlap as representing the proportion of each variables variance associated with the other variable. W c inay also see that it does not matter in this form of expression whether the correlation is positive or negative because r 2 must be positive. W e w ill obtain the variance of the residual (nonpredicted) portion when we return to the original units by multiplying by s d 2 to obtain 2x

ZY

r = .50 r 2= .25

1 ~ r 2= .75

F IG U R E 2.6.1

Overlap in variance of corrclaicd variables.

48

2

BIVARIATE CORRELATION AND R EG RESSIO N

( 2.6 .3 )

s d \ . y = sd \ (1 - r 2 ).

The standard deviation of the residuals, lhal is, of that portion o f Y not associated with X is therefore given by (2.6.4)

sdy—Y ~ s d y ' J ] ~ f 2 .

For example, when r = .50, the proportion of shared variance = r 2 = .25 and .75 of s d j is not linearly related to X. If the portion of Y linearly associated with X is removed by subtracting BX + A (= Y) from Y, we have a reduction from the original sd y to

sdy -y - sdy \/.7S = .866 s d y . W e sec that, in this case, although r -

,50, only 2 5 % o f the variance in Y is

associated with X, and when the part of Y which is linearly associated with X is removed, the standard deviation o f what remains is

.8 6 6

as large as the original

sd Y. To make the foregoing more concrete, let us return to our acadcmic example. The regression coefficient B YX was found to be .468 and .4 iA = 3.11 (sec Section 2.6.1). The Y, X, and zy values arc given in Tabic 2.6.1. The Y — Y values are the residuals of Y estimated from X, or the errors o f estimate in the sample. Because Y is a linear transformation o f X, r YY must equal rXY (= .683). The correlation hctween Y — Y (the residual o f Y estimated from X ) and Y must, as we have seen, equal zero. When we turn our attention to the variances of the variables we see that (2.6.5)

s d \ = Sd2-y - r2 I = .683= = .4667,

which makes explicit what is meant when we say that (almost) 4 7 % o f the variance o f cach variable is estimated or predicted by the other. Sim ilarly,

\

3.82 14.62

42.72

■'

i-u

y

5.02 5.32 6.22

.23 .27

17.02

>V

1.

r-

k..k. = .683

yi r ■ r ty

rjx

-- o.

W hen, however, the sample data are used to arrive at an estimate of variability in the population (symbolized by sd and scl2), dividing by n would result in a systematic underestimate. Instead, we divide the sum o f squared deviations by the number o f degrees o f freedom ( d f ), a quantity less than n. In correlation and regression analysis, this distinction is particularly important with regard to the variability (hence size) of the errors o f estimate or residuals from regression, Y —

Y. In testing for statistical significance and setting confidence limits, it is these population estimates that come into play. For the bivariate case, the estimate of the population error or residual variance for Y is given by

(2 . 6 .7 )

sd$_y = ^

{Y ~

n - 2

= 0

~

r2 ) I y Z

n ~ 2

and its square root, usually called ;iie “ standard error of estim ate" by

41 28

50

2.BI VAR iATE CO RRELATION AND R EG R ESSIO N

( 2 .6 .8 )

sd r

y -

y

I

(K - n

n —

2

/ ( I - r 2)

2

^

2

y2

n - 2

Here, d f = n — 2. Thus, while the sd of the sample residual Y values was found to be 3.62 by Eq. (2.6.4), the estimated sd of the population residuals or standard error o f estimate is, by Eq. ( 2 .6 .8 ), s d Y Y ~ 3 .6 2 V 15/13 = 3.89. Finally, Yw. and Yv in Table 2.6.1 have been computed to demonstrate what happens when any other regression coefficient or weight is used. The values ~ .3 and B i: =

.6

were chosen, to contrast with B yx = .467 (the A values have

been adjusted to keep the estimated values centered on Y). The resulting s d 2 for the sample residuals was larger in each case, 14.62 and 14.05 respectively as compared to 13.15 for the leasl-squares estimate B y x . The reader is invited to try any other value to determine that the residuals w ill in fact always be larger than with the computed value o f B YX. However, it also may be instructive to note that in spite o f the fact that

is less than two-thirds the magnitude of B YX and B v is

more than 2 0 % greater, the residual variances are only 10% and 6% larger. Thus, one can see that the error variances do not change greatly for a range of regres­ sion weights around the optimal weight, even within the sample on which (he latter was determined. W e return to this issue later (see Section 3.6.5). Examination of the residuals will reveal another interesting phenomenon. If one determines the absolute val­ ues of the residuals from the true regression estimates and from the Yv it can be seen that their sum is smaller for Y -

Yv (41.28) than for the true regression

residuals (42.72). W henever residuals are not exactly symmetrically distributed about the regression line there exists an absolute residual minimizing weight different from B YX. T o reiterate. B yx is the weight that minimizes the squared residuals, not their absolute value. This is a useful reminder that least squares, although very useful, is only one w ay o f defining discrepancies from prediction, or error.

2.7 S U M M A R Y OF D EFIN ITIO N S A N D INTERPRETATIONS Product moment r is the rate of change (linear) in i Y per unit change in zx (and vice versa) which best fits the data in the sense o f minimizing the squared discrepancies between the estimated and actual scores.

r2 is the proportion of variance in Y associated with X (and vice versa). B yx is the regression coefficient o f Y on X. Using the original raw units, the rate o f changc in Y per unit change in X, again best fitting in the least-squarcs sense.

2.8 SIGNIFICANCE TESTING OF r AND B

51

j4 is the regression intercept that serves to adjust for differences in means, giving the predicted value of the dependent variable when the independent vari­

able’s value is zero. _____ The coefficient of alienation, V I - r2, is the proportion of sdy remaining when that part of Y associated with X has been subtracted; that is. sdY y/sdy. The standard error of estimate, sdy y, is the estimated population sd of the residuals or errors of estimating Y from X.

2.8 SIGNIFICANCE TESTING OF CORRELATION AND REGRESSION COEFFICIENTS In most circumstances in which r and B arc determined, the intention of the investigator is to provide valid inferences from the data at hand to some larger universe of potential data— from the statistics obtained on a sample to the para­ meters of the population from which it is drawn. Because random samples from a population wili not yield values for r and B that exactly equal the population values, statistical methods have been developed to determine the confidencc with which such inferences can be accepted. In this section we describe the methods of significance testing— determining the risk that wc have rejected a true null hypothesis. In Section 2,9 we present methods of assessing the risk of making the other kind of error— failing to reject a false null hypothesis. In Section 2.10 we provide methods of determining the limits within which we may cxpect a population value to fall with a specified degree of confidence. 2.8.1 Assumptions Underlying the Significance Tests It is clear that no assumptions are necessary for the computation of correlation, regression, and other associated coefficients or their interpretation when they arc used to describe the available data. However, the most interesting and useful applications of r and B occur when they arc statistics calculated on a random sample from some population are desired. As in most circumstanccs in which statistics are used inferentially, the addition of certain assumptions about She characteristics of the population (when valid) substantially increases the number of useful inferences that can be drawn. Fortunately, the available evidence sug­ gests that even fairly substantial departure from the assumptions will frequently result in little error of inference when the data are treated as if the assumptions were valid. Probably the most generally useful set of assumptions arc those that form what has been called the fixed linear regression model (Binder, 1959). This model assumes that the two variables have been differentiated into an independent variable X and a dependent variable Y. Values o f X are assumed to be “ fixed” in the analysis of variance sense, that is, sclectcd by the investigator rather than sampled from some population of X values. Values of Y are assumed to he

52

2.

BIVARIATE CORRELATION AND REGRESSION

randomly sampled for each of the selected values of X. The residuals from the mean value of Y for each value o f X are assumed to be normally distributed with equal variances in the population. It should be noted that no assumptions about the shape of the distribution of X and the total distribution of Y per se are necessary, and that, of course, the assumptions are made about the population and not about the sample. This model, extended to multiple regression, is used throughout the book. Fortunately, even this rather liberal model can be somewhat violated with typically little risk of error in conclusions about the prcscnce or absence of a linear relationship. A number of studies (Binder, 1959; Boneau, I960; Cochran, 1947; Havlicek & Peterson, 1977) have demonstrated the robustness of the r and F tests to failure of distribution and other assumptions, although it must be cautioned that the probabilities (significance) calculated under such circum­ stances may be somewhat over or underestimated. Assumption failure of the heteroscedastic type— circumstanccs in which the residuals have grossly unequal variances at different values of X — suggests the need for improving the prediction either by an appropriate transformation of X or Y or by including additional variables in the equation (see Section 3.9.1). 2.8.2 f Test for Significance of r and B Usually, the first test of interest is to determine the probability that the population

r is different from zero; that some linear relationship exists. To lest the null hypothesis that the population r is zero, substitute the sample r and n in the following formula to find Student's (2 .8 . 1 )

/=

rV n — 2 V l - r2

The resulting t is looked up in a standard table, with n — 2 degrees of freedom ( df) to determine whether the probability (P) is "sufficiently” small that an r as large as the one observed (either positively or negatively, hence “ two-tailed” ) would be obtained from a random sample of size n drawn from a population whose r is zero. If so, the null hypothesis is rejected, and the sample r is deemed “ statistically significant.” The criterion for “ sufficiently” small, a , is conven­ tional; by far the most frequently used are the a = .05 and (more stringent) a = .01 criteria for statistical significance. Appendix Table A gives, for varying df, the values of t necessary for statistical significance at a = .01 and at a = .05 (two-tailed). Thus, if the t computed from Eq. (2 .8 .1) cxcceds the tabled value for d f - n — 2 at the prescribed a, then P < {is less than) oc, and the null hypothesis is rejected “ at the a significance level.” In our fixed model, the population to which this generalization is confined is strictly the population consisting of the exact values of X as occur in the sample. However, typically, generalization to the population of X values covering the same range as the sample can be made.

2.8 SIGNIFICANCE TESTING OF r AND B

53

TABLE 2.8.1 Analysis of Variance of Simple Regression of Y on X Source Regression

X (Y - K)2

= B X jty

Residual (error)

Mean square

Sum of squares J



r 2 X v=

X ( Y - Y)2 = X y ’ ■•r 2 X y2 - ( 1 - r 2) X y2

(I - r 2) X y 2 n — 2

regression mean square

r 2 X y2

r-(n - 2)

residua) mean square

(1 - r 2) X y 2.'(« - 2)

I - r2

Because F for one d f in the numerator equals t2 (with equal error df), it is useful to consider this significance test as carried out by analysis of variance. Because r2 is the proportion of Y variance associated with X, we may present the data in an analysis of variance format as seen in Table 2.8.1. It may be deduced from the presence of B yx in one form of the sum of squares for regression that the F test (and consequently the t test) for significance of the correlation coefficient also tests whether the regression coefficient is nonzero in the population. Indeed. B ^ can be zero if and only if r equals zero. Also, note that the error or residual mean square is simply the squared standard error of estimate of Eq. (2.6.7), and that the expression for F in Table 2 .8 .1 is simply the square of the expression for t in Eq. (2.8.1). 2.8.3 Fisher's z' Transform ation and Comparisons betw een Independent rs Although f and F tests arc appropriate for testing the significance of the departure of r from 0 , a different approach is necessary for estimating the confidence limits of a population r. The reason for this is that the sampling distribution of nonzero correlations is skewed, the more so as the departure from zero increases. To sidestep this problem, R. A . Fisher developed the z' transformation of r with a sampling distribution that is nearly normal and a standard error that depends only on n: ( 2 .8 . 2 )

= -/[In ( I + r) - In ( I - r)|,

where In is the natural (base e) logarithm. Appendix Table B gives the r to z' transformation directly, with no need for computation. The standard error of z' is given by (2.8.3)

54

2.

BIVARIATE CORRELATION AND REGRESSION

Therefore, in testing the significance of a departure of some obtained r from a hypothetical population r, one need only divide the difference between their z' equivalents by the standard error to obtain a normal curve deviate: A - *h

(2.8.4)

«4' where i (not to be confused with z') is the standard normal curve deviate, z’ the transformed sample r, and z'h the iransformed hypotheiical r. The tail area be­ yond i is used in the usual way to obtain the P value for the significance test (Appendix Tabic C). For example, a study determines the correlation between education and income for a sample of 103 adult black, men to be .47. The null hypothesis is that the population r for blacks equals the population r beiwcen ihcse two variables for adult white men in the United States, which is known to be .63. W e can look up the tabled z values for = .47 and rh = .63 which are .510 and .741. respec­ tively. The standard error is computed as

- v la b -3 -

M

and the normal curvc deviate as .510-.741 z =■ .10

-2.31.

Turning to the normal curve table for the closest tabled value (Appendix Table C ). we find the z = 2.30 gives us .011 for the area in the tail. Doubling this value gives us approximately . 0 2 2 as the two-tailed probability ( P ) level, which would lead us to rejcct (because P is less than a — .05) the null hypothesis that the relationship between income and education is the same for the population of black men that we sampled as it is for the entire male population of the United States, and we conclude that the correlation between education and income is smaller for black than for white adult males in the United States. Sim ilarly, if one wishes to test the significance of the difference between correlation coefficients obtained on two different random samples one tnay com­ pute the normal curve deviate (2.8.5)

z\ - z \

z =1

V

-3

+

1

n-2 ~ 3

To illustrate this test, and incidentally make another point, assume more real­ istically that in the previous problem the r for the U .S . adult white male popula­ tion is not known, but rather that wc have a sample of 103 cases from this population whose r is (coincidentally) .63. Thus, instead of comparing the sam­ ple r for black men (.47) to a known population value as before, we compare it to

2.8 SIGNIFICANCE TESTING OF r AND B

55

another sample r (.63), each sample containing i 03 cases. W e convert the ra to their z' equivalents, and for the comparison between two sample z values find .510 - .741 V 103-3

.510 - .741

___ L _ 103 - 3

V . 0 I + .01

The result docs not meet the two-tailed a = .05 criterion (which is 1.96; Appendix Table C ). and unlike before, is not significant. The reason is that the standard error of the difference between two sample values is subject to the sampling variation of both ( V .0 1 + .01 = .1414), whereas in the previous formulation the known population value for adult males was not subject to sampling variation, but only that for blacks, with a resulting smaller standard error ( V j O l = . 1 0 ). The z ‘ transformation is also useful in circumstances in which more than two independent correlation coefficients are to be compared to determine whether they may be considered homogeneous (equal in the population). This can be seen to be a generalization of the previous test for two correlations. In this case the test for homogeneity employs the y 2 (chi-squared) distribution for k — I degrees of freedom, where k = the number of independent sample coefficients being compared: ( 2 .8 . 6 )

[1 (*,. - 3)(z ; ) 2 * («, “

3)

where the summation is over the k samples. For example, a researcher has calculated correlation coefficients between two measures of social interaction obtained on random samples from 10 elementary schools (see Table 2.8.2). The research question posed is whether this can be considered a homogeneous set of correlation coefficients, that is, whether the values obtained would be expected on the basis of random sampling from a single normal population. W e look up the z' values for each of the r ’s, and find the x 2 value to be 19.893. Consulting a X 2 table, we find this value for d f = k — I = 9 meets the .05 a criterion level (16.92). W c therefore conclude that the two measures of social participation arc not correlated to the same degree in these schools. 2.8.4 The Significance of th e Difference betw een Independent S's Becausc B YX = r sdy/sdy , it is apparent that we may find circumstances in which r ’s obtained on two independent samples may be significantly different from each other, yet the regression coefficients may not, or vice versa. This can happen when the larger r occurs in combination with a proportionately larger sd x, for example. That such findings may be relatively common will be under-

56

2.

8IVARIATE CORRELATION AND REG RESSIO N

T A B LE 2.8.2 X* Test for the Homogeneity of the Correlations between Tvsio Measures of Social Participation in 10 Schools

School

ri

A B C D F I-

.72 .41 .57 .53 .62 .21 .68 .53 .49 .50

c; H I j

-3

"i 67 93 73 98 82 39 91 27 7S 49

.908 .436 .648 .590 .725 .214 .829 .590 .536 .549

694

6.025

.824 .190 .420 .348 .526 .046 .687 .348 .287 .301

(«,--3)(*;-’ )

64 90 70 95 79 36 88 24 72 46

S8 .ll 39.24 45.36 56.05 57.28 7.70 72.95 14.16 38.59 25.25

52.77 17.11 29.39 33.07 41.52 1.65 60.48 8.3S 20.69 13.86

664

414.70

278.89

414 7D7 x’ ■ 278.89 — - r™ ~ = 19.893*, 664

(»,-3 )(z;)

from samples E and /•'), ________________________ Bt: - Br ______________________ (2,8‘7)

, / s (y,: V

y ,y +

2

(/ , - v , y Z * 4 + 2 4 x X2 . s X2

with d f = nF + nh- - 4, and refer to Appendix Table A. Much later (Chapter 8 ), we w ill see that this can be accomplished routinely as a test of significance of an interaction. 2.8.5 The Significance of th e Difference betw een Dependent r“s Suppose the question is asked whether some variable X correlates with Y to a significantly different degree than docs another variable V. The three variables may be measured on some sample and the correlation coefficients rxy , rx v , and rvy obtained. It is not appropriate to determine the significance of the difference between rxy and rxv by way of the Fisher z' and Eq. (2.8.5) becausc the coefficients have not been determined on independent samples. Just as in the t test for means from dependent or matched samples, it is necessary to take into account the correlation over samples between the coefficients being tested due to

2.8 SIGNIFICANCE TESTIN G OF rA N D B

57

the fact that they come from the same sample. The resulting formula yields a i for

n — 3 degrees o f freedom (Steiger, 1980): , and each of these standard errors must be further distinguished from the standard error of a given Y!t, estimated for a new observation on which an X value, Xtl, is available. To understand the difference between this latter value and the standard error of estimate, it is useful to realize that whatever sampling error has occurred in estimating the population regression coefficient by the sample B yx will have more serious eonsequcnees for X values that are more distant from X than for those near X. For example, for the sake of simplicity let us assume that both X and Y have means of zero and sris of 1. Let us further suppose that the R rx value as determined from our sample is .20, whereas the actual population value is .25.

64

2.

B IV A R IA T E C O R R E L A T IO N A N D R E G R E S S IO N

For new cases that come to our attention with X tl values = . 1, wc will estimate Y at .02 (= B yXl>) when the actual population mean value of Y for all X t, = .1 is .025, a relatively small error o f .005. On the other hand, new values of X a — 1.0 wiSi yield estimated Y values of .20 when the actual mean value o f Y for ali X = 1 is .25, the error (.05) being 10 times as large. W hen a newly observed X a is to be used to estimate Y„ we may determine the standard error and thus confidencc limits for this Y„. The accuracy o f such confidence limits is dependent on the validity o f the assumption of linearity and equally varying norma! distributions of Y values across all values o f X . Under such conditions the standard error of an estimated Yt> value w ill be

H r .- ,.- * - ,

/ l+ ± + « . V n n sdx

Bccausc sd Y . y, the standard error o f estimate, is a constant for all values of X, the magnitude of the error is a function of the magnitude o f the difference between X (! and X , that is, the extremeness in either direction o f X a. The same equation when the X lf value is standardized (zfJ) makes the cffcct of departure from X more obvious: (2.10.4)

s d Yo Yo = sdy

y yj

1

+

For example, return to our faculty study and assume that we now wish to determine the confidence interval for an estimated number of publications by a faculty member not previously included in the sample who has held the Ph.D . for 13 years. From Eq. (2.6.8),

s d Y _ y = 3.89 y j 1 +

1

+]5 4 ?~ = 4.05.

Noting that when X = 13, Y ^ = 9.19 and we determine the 95% confidence interval by multiplying 4.05 by the / value at a = .05 for d f = n — 2 = 13, which is 2.160 (Appendix Tabic A ), that is. 4.05(2.160) = 8.75. When the necessary assumptions of homoseedasticity and normality arc valid, wc may be 9 5 % confident that this individual’s publications will be more than 9.19 — 8.75 = .44 and fewer than 9.19 + 8.75 = 17.94. Noting that this interval covers nearly the entire range o f observed values, one is tempted to conclude that the problem lies primarily in the small sample size (« = 15), because, by empirical standards, the r o f .683 is very large. This turns out, however, not to be the case. W ere n = 150, ten times as large, and keeping r and all else constant, rccomputation o f the 9 5 % confidence interval for the prediction for this case results in the limits for f B o f 1.95 and 16.43. Although slightly narrower than those for n

-

15 (.44 and 17.94), this interval still spans nearly the entire observed range.

The primary reason for the imprecision o f predicting Y from a given X lies in the fact that r is too low! Yet, as r ’s in real data go, .683 is very large, indeed. W e arc forced to conclude that it is a relatively rare circumstance in the social

2.11 F A C T O R S A F F E C T IN G T H E S IZ E O F r

65

sciences that a data-based prediction for a given individual w ill be a substantial improvement over simply predicting that individual at the mean.

2.11 FACTORS AFFECTING THE SIZE OF r 2.11.1 T h e D istrib u tio n s of X and Y Because r = ± 1.00 only when each zx = z Y or —z Y, it can only occur when the shapes o f the frequency distributions o f X and Y are exactly the same (or exactly opposite for r = — 1.00). The greater the departure from distribution similarity, the more severe w ill the restriction be on the maximum possible r. In addition, as such distribution discrepancy increases, departure from homoscedasticity— equal error for different predicted values— must also necessarily increase. The de­ crease in the maximum possible value of (positive) r is especially noticeable under circumstances in which the two variables are skewed in opposite direc­ tions. One such common circumstance occurs when the two variables being correlated are each dichotomies. W hen the variables have very discrepant pro­ portions, it is not possible to obtain a large positive correlation. For example, suppose that a group o f subjects has been classified into “ risk takers" and “ safe players” on the basis of behavior in an experiment, resulting in 90 risk takers and 10 safe players. A correiation is computed between this dichotomous variable and self classification as “ conservative” versus “ liberal” in a political sense, with 60 subjects identifying themselves as conservative (Table 2 .1 1 .1). Even if all political liberals were also risk takers in the experi­ mental situation, the correlation w ill be only [by Hq. (2.3.6)]: 400 - 0 ^

790-10-40*60

It is useful to divide the issue o f the distribution of variables into two compo­ nents, those due to differences in the distribution of the underlying constructs and

TABLE 2.11.1 Bivariate Distribution of Experimental and Self-Report Measures of Conservative Tendency Experimental Risk lakers

Safe players Total: 40

Liberal

40

0

Conservative

50

10

60

Total:

90

10

100

Sclf-repon

66

2.

B IV A R IA T E C O R R E L A T IO N A N D R E G R E S S IO N

those due to the scales on which we have happened to measure our variables. Constraints on correlations associated with differences in distribution inherent in the constructs are not artifacts, but have real interpretive meaning. For example, sex and height for American adults are not perfectly correlated, but we need have no concern about an artificial upper limit on r attributable to this distribution difference. If sex completely determined height, there would only be two heights, one for men and one for women, and r would be

1.

Sim ilarly the observed correlation between smoking and lung cancer is about .10 (estimated from figures provided by Doll & Peto, 1981). There is no artifact o f distribution here; even though the risk of cancer is about 1 1 times as high for smokers, the vast majority of both smokers and nonsmokers alike w ill not con­ tract lung cancer, and the relationship is low because of the nonassociation in these many cases. Whenever the concept underlying the measure is logically continuous or quan­ titative6— as in the preceding example o f risk taking and liberal— conservative— it is highly desirable to measure the variables on a many-valued scale. One effect o f this w ill be to increase the opportunity for valid discrimination o f individual differences (see Section 2 .1 1.2). T o the extent that the measures are similarly distributed, the risk of underestimating the relationship between the conceptual variables w ill be reduced (see Section 8.2.2); however, the constraints on r due to unreliability are likely to be much more serious than those due to distribution differences.

The Biserial r When the only available measure of some construct X is a dichotomy, dx , an investigator may wish to know what the correlation would be between this construct and some other quantitative variable, Y . For example, X may be ability to learn algebra, which we measure by dx . pass— fail. If one can assume that this “ underlying” continuous variable X is normally distributed, and that the rela­ tionship with Y is linear, an estimate of the correlation between X and Y can be made, even though only dx and Y are available. The correlation thus produced is called a biserial correlation eocffieicnt and is given by (2 . 1 1 . 1 )

=

h ( s d Y)

- r

rPb

^

h ’

where Y and Y are the Y means for the two points of the dichotomy,/) and q { — 1

— p) and the proportions of the sample at these two points, and h is the ordinate

(height) o f the standard unit normal curve at the point at which its area is divided into p and q portions (see Appendix Table C). For example, we w ill return to the data presented in Table 2.3.1, where the point biserial r was found to be —.707. W e now take the dichotomy to represent '’"Continuous " implies a variable on which infinitely small distinctions can he made; "quantita­ tive” is more closely aligned to real measurement practice in the social sciences, implying an ordered variable of many, or at least several, possible values. Theoretical constructs may be taken as continuous, but their measures will be quantitative in the above sense.

2.11 FACTORS AFFECTING THE SIZE OF r

67

not the presence or absencc of an experimentally determined stimulus but rather gross ( 1 ) versus minor ( 0 ) naturally occurring interfering stimuli as described by the subjects. This dichotomy is assumed to represent a continuous, normally distributed variable. The biserial r between stimulus and task score w ill be

r _

(6 6

b

- 69.5)(.428X-572) _

g ?3

.392 (2.45)

where .392 is the height o f the ordinate at the .428, .572 break, found by linear interpolation in Appendix Table C. The biseriai r of — .893 may be taken to be an estimate of the product-moment correlation that would have been obtained had X been a normally distributed continuous measure. It w ill always be larger than the corresponding point biserial

r and, in fact, may even nonsensically exceed I when the Y variable is not normally distributed. W hen there is no overlap between the Y scores o f the two groups, the biserial r w ill be at least 1. It w ill be approximately 2 5 % larger than the corresponding point biserial r when the break onA^ is .50 - .50. The ratio of ' iJr/,b increase as the break on X is more extreme; for example with a break of .90 - .10, the biserml w ili be about two-thirds larger than r h. The significance o f rh is best assessed by the t test on the point biserial, or equivalently, the / test on the difference between the Y means corresponding to the two points o f dx . Tetrachoric r As we have seen, when the relationship between two dichotomies is investigat­ ed, the restriction on the maximum value o f

when their breaks are very

different can be very severe. Oncc again, we can make an estimate o f what the linear correlation would be if the two variables were continuous and normally distributed. Such an estimate is called the tetrachoric correlation. Because the formula for the tetrachoric correlation involves an infinite series, and even a good approximation is a laborious operation, tetrachoric rs are usually obtained by means of diagrams (Chesire, Saffir, & Thurstone, 1933). One enters these nomographs with the

2

x

2

table of proportions and reads off tetrachoric r.

Tetrachoric r w ill be larger than the corresponding phi coefficient and the issues governing their interpretation and use are the same as for biserial and point biserial rs. Caution should be exercised in the use o f biserial and tetrachoric correlations, particularly in multivariate analyses. Remember that they arc not observed cor­ relations in the data, but rather hypothetical ones depending on the normality of the distributions underlying the dichotomies. 2.11.2 The R eliability of th e Variables In most research in the behavioral scicnces, the concepts that are of ultimate interest and that form the theoretical foundation for the study are only indirectly and imperfectly measured in practice. Thus, typically, interpretations of the correlations between variables as measured should be carcfully distinguished

68

2.

B IV A R IA T E C O RRELA TIO N A N D R E G R E S S IO N

from the relationship between the constructs or conceptual variables found in the theory. The reliability of a variable (rx x ) may be defined as the correlation between the variable as measured and another equivalent measure of the same variable. In standard psychometric theory, the square root of the reliability coefficient ( V r ^ ) may be interpreted as the correlation between the variable as measured by the instrument or test at hand and the “ true” — error-free— score. Because true scores are not themselves observable, a series of techniques has been devel­ oped to estimate the correlation between the obtained scores and these (hypo­ thetical) true scorcs. These techniques may be based on correlations among items, between items and the total score, between other subdivisions of the measuring instrument, or between alternative forms. They yield a reliability coefficient that is an estimate (based on a sample) of the population reliability coefficient. 7 This coefficient may be interpreted as an index of how well the test or measurement procedure measures whatever it is that it measures. This issue should be distinguished from the question of the test’s validity, that is, the question of whether what it measures is what the investigator intends that it measure. The discrepancy between an obtained reliability coefficient and a perfcct relia­ bility of 1.00 is an index of the relative amount of measurement error. Each observed score may be thought of as composed of some true value plus a certain amount of error: (2 .1 1 .2 )

X = X * + e.

These error components are assumed to have a mean of zero, and to correlate zero with the true scorcs and with true or error scores on other measures. Measurement errors may come from a variety of sources, such as errors in sampling the domain of content, errors in recording or coding, errors introduced by grouping or an insufficiently fine system of measurement, errors associated with uncontrolled aspects of the conditions under which the test was given, errors due to short or long term fluctuation in individuals’ true scorcs, errors due to the (idiosyncratic) influence of other variables on the individuals’ responses, etc. For the entire set of scores, the reliability coefficient may be seen to equal that proportion of the observed score variance which is true score variance (2.11.3)

rx: x ^ Sd** sd\

Because, as we have stated, error scorcs are assumed not to correlate with anything, rxx may also be interpreted as that proportion of the measure’s vari­ ance that is available to correlate with other measures. Therefore, the correlation ’ Because this is a whole field of study in its own right, no effort will be made here to describe any of its techniques, or even the theory behind the techniques, in detail. Three cxccllent sources of such information are Lord and Novick (1968), Nunnally (1967) Cronbach (1970). and Cnnnbach, Gleser. Nanda, and Rajaratnam (1972). We return to this topic in Section 10 5.

2.11 FA C T O RS A FFEC T IN G TH E S IZ E O F r

69

between the observed scores (X and D for any two variables will be numerically smaller than the correlation between their respective unobservable true scores (,X * and Y*). Specifically: ( 2. 11.4 )

rX Y = r x ^ ' ^ r x x T Y Y -

Researchers sometimes wish to estimate the correlation between two theoreti­ cal constructs from the correlation obtained between the imperfect observed measures of these constructs. To do so, one corrects for attenuation (unre­ liability) by dividing rXY by the square root of the product of the reliabilities (the maximum possible correlation between the imperfect measures). From Kq. (2.11.4), (2.11.5)

rx * Y. = / XY

\ rX X fy y Thus, if two variables, each with a reliability of .80, were found to correlate •44,

Fy *y * — .

. —*55. V(.8 0 )(.8 0 )

As for all other estimated coefficients, extreme caution must be used in in­ terpreting attenuation-corrected coefficients, because each of the coefficients used in the equation is subject to sampling error. Indeed, it is even possible to obtain attenuation-corrccted coefficients larger than 1 when the reliabilities come from different populations than rXY, are underestimated, or when the assumption of uncorrelated error is false. Obviously, because the rx *Y* values are hypotheti­ cal rather than based on real data, no significance tests can be computed on their departure from zero or any other value. The problem of unreliability is likely to be particularly severe when one of the variables is a difference score— that is, when it is obtained by subtracting each person's score on some given variable (4 ) from that person’s score on some other variable ( B ), where A and B are positively correlated. Such difference scores are common when A and B represent scores after and before some treatment and A B is intended to represent change. Another common difference-score situation is found when A and B are two measures obtained on some inventory and an aspect of inventory profile (A - B ) is being investigated. Under any of these circum­ stances, the A — B difference score is likely to be less reliable than cither original score and may be estimated (assuming sdA = sdH) hy the following formula: U - tl.b )

,

r(A--B)(A-B)

_ \(rAA ^ ra a ) l 2 ] ~!~a b -- —— -----------! rAB

As the correlation between the two variables approaches their average reliabili­ ty, the reliability of the difference score approaches zero. For example, the reliability of the difference score between two variables each with a reliability of .80 and a correlation of .60 w ill be only .50. Two variables with reliabilities of .60 and an intercorrelation of .45 would yield a difference score with a rclia-

70

2.

BIVARIATE CORRELATION AND R EG R ESSIO N

biiity of only .27. Reliabilities of .60 are by no means uncommon in the behav­ ioral sciences; in fact, in some circumstances (psychiatric assessment, opinion surveys) they may even be considered reasonably good. Thus, the danger in using difference scorcs is a real one, because they frequently cannot be expected to correlate very substantially with anything else, being made up mostly of measurement error. See Section 10.6 for a generally superior alternative pro­ cedure of working with regressed changc. To reiterate, unreliability in variables is a sufficient reason for low correlations; it can not cause correlations to be spuriously high. Spuriously high correlations may, o f course, be found when sources of bias are shared by variables, as can happen when observations are not “ blind,” when subtle selection factors are operating to determine which cases can and cannot appear in the sample studied, and for yet other reasons. 2.11.3 Restriction of Range A problem related to the general problem of reliability occurs under conditions when the range on one or both variahles is restricted by the sampling procedure. For example, suppose that in the data presented in Table 2 . 1.1 and analyzed in Table 2.6.1 we had restricted ourselves to the study o f faculty members who were less than 10 years post-Ph.D. rather than the full range. If the relationship is well-described by a straight line and homoscedastic, we shall find that the vari­ ance of the Y scores about the regression line, s d \ _ y, remains about the same. Because when r ^

0, x d \ w ill be decreased as an incidental result of the

reduction in s d \ , and because s d \ = s d \ + sd\_ p, the proportion o f s d \ associated with X, namely, s d \ , w ill necessarily be smaller; therefore, r( = s d Z ls d l ) and r w ill be smaller. In the current example /"decreases from .683 to .339 and r 2, the proportion of variance, from .4667 to .1147! (See Table 2.11.2.) The regression coefficient B ^ , on the other hand, w ill remain approx­ imately the same, because the decrease in r w ill be offset by an increase in the ratio sdyJsdx . It is .419 here, compared with .467 before. (It dropped slightly in this example, but could just as readily have increased slightly.) The fact that B yx tends to remain constant over changes in the variability of X is an important property of the regression coefficient. It is shown later how this makes them more useful as measures o f relationship than correlation coefficients in some analytic contexts. Suppose that an estimate o f the correlation that would be obtained from the full range is desired, when the available data have a curtailed or restricted range for

X. I f we know the sdx o f the unrestricted X distribution as well as the sdXc for the curtailed sample and the correlation between Y and X in the curtailed sample ( rXfy) , wc may estimate rXY by (2.11.7)

rx cr (sdx lsdxc)

2.11 F A C T O R S A F F E C T IN G T H E S IZ E OF r

For example, r — .25 is obtained on a sample for which sdx

71

— 5 while the sd x

o f the population in which the investigator is interested is estimated as

12.

Situations like this occur, for example, when some selection procedure such as an aptitude test has been used to select personnel and those selected are later assessed on a criterion measure. I f the finding on the restricted (employed) sample is projected to the whole group originally tested, rXY would be estimated to be

rXY ~ ~ ------ -

.25(12/5) • •-

.60 —

-

V T T .2 5 2 [(12/5)2 -1]

— ---------- ----

.5 3

\ [ \ .2975

It should be emphasised that .53 is an estimate and assumes that the relation­ ship is linear and homoscedastic, which might well not be the ease. There is no significance test appropriate for rx y ; it is significant if rx Y is significant. It is quite possible that restriction of range in either X or K, or both, may occur as an incidental by-product o f the sampling procedure. Therefore, it is important in any study to report the sds o f the variables used. Because under conditions of homoscedasticity and linearity regression coefficients are not affected by range restriction, comparisons o f different samples using the same variables should usually be done on the regression coefficients rather than on the correlation coefficients when .sc/s differ. Investigators should be aware, however, that the questions answered by these comparisons are not the same. Comparisons of correlations answer the question ‘ ‘does X account for as much o f the variance in >' in group E as in group F T ' Comparisons of regression coefficients answer the question “ does a change in X make the same amount of score difference in Y in group E as it does in group F ? ”

TABLE 2.11.2 Correlation and Regression on Number of Publications for a Restricted Range of Years since Ph.D. Case ] 2 3 4 6 7 8 9 m set

X = Years

¥ ~ Publications

) 7 5 7 4

12 9

rxy = .339 (.683)"

3 8 4

3 1 8

r \ y = 1 1 4 7 (.4667)

4.67

5.78 3.46

2.40

2 4 5

Byx = .419 (.463)

"Parenthetic values are those for the original (i.e.. unrcstricled) sample: See Table 2.2.2

72

2.

B IV A R IA T E C O RRELA TIO N A N D R E G R E S S IO N

Although the previous discussion has been cast in terms of restriction in range, an investigator may be interested in the reverse— the sample in hand has a range of X values that is large relative to the population of interest. This could happen, for example, if the sampling procedure was such as to include disproportionately more high- and low-X cases and fewer middle values. The obtained r w ill then be “ too large.” Equation (2.11.7) can be employed to estimate the correlation in the population of interest (whose range is X is less) by reinterpreting the subscript

C in the equation to mean changed (including increased) rather than curtailed. Thus, rX Y and sdx arc the “ too large” values in the sample, sdx is the (smaller) sd of the population of interest, and fXY the estimated (smaller) r in that popula­ tion. Note that the ratio sdx /sdx _, which before was greater than one, is now smaller than one. 2.11.4 Part-W hole Correlations Occasionally we w ill find that a correlation has been computed between some variable J and another variable W. which is the sum of scores on a set of variables including J. Under these circumstances a positive correlation can be expected between J and W due to the fact that W includes J . even when there is no correlation between W and W - J. For example, if k test items of equal sd and zero r with each other are added together, each of the items w ill coneiate exactly 1 / V i with the total score. For the two-item case, therefore, each item would correlate .707 with their sum, W. when neither correlates with the other. On the same assumption o f zero correlation between the variables but with unequal sda, the variables arc effectively weighted by their differing sdi and the correlation of

J with W w ill be equal to s d jlV X s d f.* Obviously, under these circumstances, rjt.w j) ~ ^ In fhe more common case where the variables or items arc corre­ lated, the correlation of J with W — J may be obtained by rjwsdw - sdj V sd\i/ + sdj ~ 2rjwsdy/sdj This is not an estimate and may be tested via the usual t test for the significance of r. Given these often substantia! spurious correlations between elements and totals including the elements, it behooves the investigator to determine rJ(W . ,r or at the very least determine the expected value when the elements are uncorreiated before interpreting rJw.

Change Scores It is not necessary that the parts be literally added in order to produce such spurious correlation. If a subscorc is subtracted a spurious negative component in the correlation w ill also be produced. One common use of such difference scores in the social scicnces is in the use of post- minus pretreatmcnt (change) scores. If **1'he summation here is over (he k items.

2.11 F A C T O R S A F F E C T IN G TH E S IZ E O F r

73

such change scores are correlated with the pre- and postscores from which they have been obtained, we w ill typically find that subjects initially low on X will have larger gains than those initially high on X, and that those with the highest final scores w ill have made greater gains than those with lower final scores. Again, if

= sdpmt and rpK posl = 0, then rpre thange = - .707 and rpHSl thangc

= + .707. Although in general wc would expect the correlation between pre- and postscores to be some positive value, it will be limited by their respective reliabilities (Section 2.11.2) as well as by individual differences in true change. Suppose that the correlation between two measures would be perfect if they were each perfectly reliable, that is, each subject changed exactly the same amount in true scores as every other. Nevertheless, given the usual measurement error, the pre-post correlation w ill not equal

1,

and it will necessarily appear that some

subjects have changed more than have others. Even with fairly good reliability so that the observed r pn. I2 .U .9 }

, is .82, and equal .sc/s, V e change

=

‘ V A *

^prepost/

= - J ^ = L = = -.30. V 2 ( l - .82) The investigator may be tempted to interpret this correlation by stating that, for example, subjects with low scorcs on the initial test were “ helped” more than were subjects with high initial scores. However, we have already posited that true changes in all subjects were exactly equal! Obviously, the r = —.30 is a necessary consequence of the unreliability of the variables and cannot be other­ wise interpreted. If the post- minus prctreatment variable has been c tea ted in order to control for differences in prctreatment scores, the resulting negative correlation between prc and changc scores may be taken as a failure to remove all influence of prescores from postscores. This rcfleets the regression to the mean phenomenon discussed in Section 2.5 and the consequent interpretive risks. The optimal methods for handling this problem arc the subjcct of a whole literature (Cronbach & Furby, 1970) and cannot be readily summarized. However, the appropriate analysis, as always, depends on the underlying causal model. (See Section 10.6 for a further discussion o f this problem.) 2.11.5 R atio or Index Variables Ratio (index or rate) scores are those constructed by dividing one variable by another. W hen a ratio score is correlated with another variable or with another ratio score, tbc resulting correlation depends as much on the denominator of the score as it docs on the numerator. Bccause it is usually the investigator’s intent to “ take the denominator into account” it may not be immediately obvious that the correlations obtained between ratio scorcs may be spurious— that is, may be a consequence of mathematical necessities that have no valid interpretive use.

74

2.

BIVARIATE CO RRELATION AND R EG R ESSIO N

Ratio correlations depend, in part, upon the correlations between all numerator and denominator terms, so that >\rr/,)X is a function o f r Y/ and rx / as well as of

rYX, and i~tr ,z ^ x/W) depends on ryw and rx / as well as on the other four correla­ tions. Equally problematic is the fact that such correlations also involve the coefficients of variation {vx = sdx !X) of each of the variables. Although the following formula is only a fair approximation o f the correlation between ratio scores (requiring norma! distributions and homoscedasticity and dropping all terms involving powers of f greater than v2), it serves to demonstrate the depen­ dence of correlations between ratios on all v Ts and on r 's between all variable pairs: ? i111 . 11 fll If Z 0)

r YjZ ){XjW) ~ r{ ~

ryxVyVx --- ~>‘YWi)Y ------- l>W ~ rXZl}X v/, + rz w vz v w v

VY + vz - 2rYZv y vz V vx +

- 2rx w vx v w

W hen the two ratios being correlated have a common denominator, the pos­ sibility of spurious correlation becomes apparent. Under these circumstances, the approximate formula for the correlation simplifies, because Z = W. If all coeffi­ cients o f variation are equal when all three variables arc ((^correlated we w ill find

riY.?UX/Z)

~ -50. Because the coefficient o f variation depends on the value of the mean, it is

clear that whenever this value is arbitrary, as it is for most psychological scores, the calculated r is also arbitrary. Thus, ratios should not be correlated unless each variable is measured on a ratio scale, a scale for which a zero value means literally none of the variable (see Chapter

6 ).

Measures with ratio scale properties

are commonly found in the social scicnces in the form of counts or frequencies. At this point it may be useful to distinguish between rates and other ratio variables. Rates may be defined as variables constructed by dividing the number of instances o f some phenomenon by the total number of opportunities for the phenomenon to occur; thus, they are literally proportions. Rates or proportions are frequently used in ecological or epidemiological studies where the units of analysis are aggregates o f people or areas such as counties or census tracts. In such studies, the numerator represents the incidence o f some phenomenon and the denominator represents the population at risk. For example, a deliquency rate may be calculated by dividing the number of delinquent boys aged 14-16 in a county by the total number of boys aged 14-16 in the county. This variable may­ be correlated across the counties in a region with the proportion of families whose incomes are below the poverty level, another rate. Because, in general, the denominators will reflect the populations of the counties, which may vary greatly, the denominators can be expected to be substantially correlated. In other cases the denominators may actually be the same as. for example, in an investi­ gation of the relationship between delinquency rates and school dropout rates for a given age-sex group. The investigator w ill typically find that these rates have characteristics that minimize the problem of spurious index correlation. In most

2.11 FACTORS A F F E C T IN G THE SIZE OF r

75

real data, the coefficients o f variation of the numerators will be substantially larger than the coefficients of variation of the denominators, and thus the correla­ tion between rates w ill be determined substantially by the correlation between the numerators. Even in such data, however, the resulting proportions may not be optimal for the purpose o f linear correlation. Section 6.5.4 discusscs some non­ linear transformations o f proportions, which may be more appropriate for analy­ sis than the raw proportions or rates themselves. Experiment a! !y produced rates may be more subject to problems of spurious correlation, especially when there are logically alternative denominators. The investigator should determine that the correlation between the numerator and denominator is very high (and positive), because in general the absence of such a correlation suggests a faulty logic in the study. In the absencc o f a large correla­ tion, the coefficients of variation o f the numerator should be substantially larger than that o f the denominator if the problem of spurious correlation is to be minimized.

Other Ratio Scores W hen the numerator does not represent some subclass of the denominator class, the risks involved in using ratios are even more serious, because the likelihood o f small or zero correlations between numerators and denominators and relatively similar values o f v is greater. I f the ratio scale properties of variables are insufficiently “ strong” (true zeros and equal intervals), correla­ tions involving ratios should probably be avoided altogether, and an alternative method for removing the influence o f Z from Y should be chosen, such as partial correlation (see Chapters 3, 4, and 10). The difficulties that may be encountered in correlations involving rates and ratios may be illustrated by the following example. An investigator wishes to determine the relationship between visual scanning and errors on a digit-symbol task. A ll subjects are given 4 minutes to work on the task. Because subjects who complete more digit symbols have a greater opportunity to make errors, the experimenter decides, reasonably enough, to determine the error rate by dividing the number o f errors by the number of d-s completed. Sim ilarly, it is reasoned that subjects completing more d-s substitutions should show more horizontal visual scans, and thus visual scanning is measured by dividing the number of visual scans by the number of d-s completed. Table 2.11.3 displays the data for 10 subjects. Contrary to expectation, subjects who completed more d-s did not tend to produce more errors (ryx = —. 105) nor did they scan notably more than did low scorers ( ryy = .106). Nevertheless, when the two ratio scores are com­ puted, they show a substantial positive correlation (.427) in spite of the fact that the numerators showed slight negative correlation ( — .149), nor is there any tendency for scanning and errors to be correlated for any given level o f d-s completion. Thus, the r {Xr^ Yrjr) may here be seen as an example of spurious correlation.

76

2.

B IV A R IA T E C O R R E L A T IO N A N D R E G R E S S IO N

T A B L E 2.11.3 Example of Spurious Correlation between Ratios No. completed d-s

No. errors

No. scans

No. errors

No. scans

(Z )

rc emended treatment is given in Chapter 9. together with references, [-‘ominately, there is also available an excellent detailed non mathematical textbook treatment by Kenny (1979).

79

80

3.

M RC: TW O OR M O RE IVS

compelling resolution is presented in Cook and Campbell (1979). In our frame­ work, to say that A is a cause of B carries with it three requirements: 1. A precedes B in time (although they may be measured at the same time). 2. Some mcchanism whereby this causal cffcct operates can be posited. 3. A change in the value of A is accompanicd by a change on the average in the value of B. When A or B arc quantitative variables (e.g., dollars, score points, minutes, millimeters of mercury, percentile ranks), the meaning of value is obvious. When A is a nominal scale (i.e., a collcction of two or more qualitative states), a change in value means a change from one state to another (e.g., from Protestant to Catholic or Protestant to non-Protestant, from schizophrenic to nonschizo­ phrenic, or from one diagnosis to another). When B is a dichotomy (e.g., schizophrenia-nonschizophrenia), a change in value on the average means a change in proportion (e.g., from 10% schizophrenia for some low value of A to 25% schizophrenia for some higher value). The third proposition should not be simplified to mean “ if you change A, B will change.” This may, of course, be true, but it need not be. First, it may not be possible to manipulate A. For example, boys have a higher incidence of reading disability than girls; here sex (/I) causes reading disability (fl), but it is meaningless to think in terms of changing boys into girls. Second, even when A can be manipulated, the way it is manipulated may determine whether and how B changes, because the nature of the manipulation may defeat or alter the normal causal mechanism whereby A operates. The models that we are employing have their roots in the path-analytic dia­ grams developed by the geneticist Seweil Wright (1921) for untangling genetic and nongenctic influences. Models with much the same logical properties are employed by economists (Goldberger, 1964; Goldbcrger& Duncan, 1973) who use the terms structural models or structural equation models. The purpose of the model is to make explicit exactly what the investigator has in mind about the variables and their interrelationships. As such, they contribute to the clarity and internal consistency of the investigation. It should be recognized at the outset, however, that a causal model may never be established as proved by a given analysis; all that may be said is that the data are consistent with a given model or that they arc not. Thus, the value of a given model is determined as much by the logic underlying its structure as by the empirical demonstration of the fit of a given set of data to the model. 3.1.2 Diagramatic Representation of Causal Models The basic rules for representing a causal model are quite simple. 2 Causal effects are represented by arrows going from the cause to the effect (the “ dependent” -Riis initial discussion is limited to elementary modds and omits consideration of the effects of unmeasured causes and the assumptions underlying the model, for which see Chapter 9.

3.2 R E G R ESSIO N WITH TW O IN D EPEN D EN T V A R IA B LE S

c

81

P u b lic a tio n s

f

S a ia r y

Years since PhD

FIG U RE 3.1.1

Causal Model of Acadcinic Salary Lixample

variable). Usually the causal flow is portrayed as going from left to right, although in complex models other arrangements may be necessary to produce a clear diagram. Some or ail of the independent variables are considered ex­ ogenous or predetermined variables. These variables are taken as given and the model requires no explanation of the causal relationships among them. The relationships among these variables are represented by curved double-headed arrows connecting each pair. To illustrate the use of a causal diagram, let us expand the academic example employed in. Chapter 2. The investigator has collected the data on number of publications and number of years since Ph.D. to determine the influence of productivity (as indexed by publications) and seniority (as indexed by yean; sincc Ph.D .) on academic salaries. The resulting causal diagram is shown in Fig. 3.1.1. In this simple model we assert that academic salary is in part determined by years sincc Ph.D. and in part by publications. These latter two variables may be correlated with each other, but no causal explanation is offered for any relationship between them. Flowever, salary is assumed not to cause changes in numbers of publications nor in years since Ph.D. This diagram representing the causal model specifies which nonzero causal effects are expected but not what the magnitude of these effects (f, g, and h) are. For this task we return to our least-squares estimation procedures.

3.2 REGRESSION WITH TW O INDEPENDENT VARIABLES

To provide the estimates of effects required by our causal model we need to produce a weight for each of our exogenous variables whose application will account for as much of the variance of our dependent variable as possible. Recalling that the regression equation, Y = BX + A, was designed to produce such an estimate for a single independent variable, wc may anticipate that a similar procedure may produce the appropriate weights for two independent variables. For example, suppose wc have gathered the data in Table 3.2.1 to estimate the

82

3.

MRC: TWO OR MORE IVS

TABLE 3.2.1 Illustrative Data for a Three-Variable Problem in the Causal Analysis of Academic Salary

3 8 4 16 15 19 8 14 28

m $25.406 sd $ 6,097

9.60 7.25

/>, = .618

= .3824) ro

27,132 27,268 32,483 27.029 25.362 28.463 32,931 28.270 38,362

2 4 5 12 5 9 3

f-j

1 2 5 7 10 4



$18,000 19,961 19.828 17,030 19.925 19.041

Number of Publications (X^)

II

Number of Years Sincc Ph.D. (X |)

Salary 00

- .683

1 8 12 9

4

Y| = $520 X, + $20,411

8

II 21

t 2 - $566 X , • S21.106

7.60 4.96

model for academic salaries presented in Fig. 3.1.1. 3 The correlation hetween salary (K) and years since Ph.D. (X ,) is .618 and B yi is therefore ,618($6097/7.25) - $520 per year. The correlation between salary and publica­ tions (X 2) is .461 and its regression coefficient is therefore .461($6097/4.96) = $566 per publication (Table 3.2.1). If X , and X , were uncorrelated, wc could simply use B yi and BY2 together to estimate Y. However, as might be expected, wc find a tendency for those faculty members who have had their degrees longer to have more publications than those who more recently completed their educa­ tion (r,2 ~ .683). Thus, X , and X 2 are to some extent redundant, and necessarily their respective estimates, Y, and Y2, will also be redundant. What wc need to estimate Y optimally from both X, and X 2 is an equation in which this redundan­ cy (or more generally the relationship between X , and X 2) is taken into account. The regression coefficients in such an equation arc called partial regression

coefficients to indicate that they are optimal linear estimates of the dependent variables

(1 0

when used in combination with specified other independent vari-

Mn this example, the number of cases has been kepi small (o enable the reader to follow computa­ tions with ease. No advocacy of such small samples is intended (see Sections 3.8 ami 4.5 on statistical power). In this and the remaining chapters the dependent variable is identified as Y and the individual indcpendenl variables by X with a numerical subscript that is. X j, X2, etc This makes it possible to represent independent variables by lheir numerical subscripts only, for example, B tx becomes By

3.2 R E G R E S S IO N W IT H T W O IN D E P E N D E N T V A R IA B L E S

ablcs. Thus, B y} 2 ‘s

83

partial regression coefficient for Y on X , when X 2 is also

in the equation, and B y-,.} is the partial regression coefficient for Y on X 2 when X| is also in the equation. The full equation is (3.2.1)

Y = B Y}.2X f + B r 2 l X 2 + A Y. I2.

The partial regression coefficients or B weights in this equation, as well as the regression constant A , are determined in such a w ay that the sum of the squared differences between (actual) Y and (estimated) Y is a minimum. Thus, the multi­ ple regression equation is defined by the same least-squares criterion as was the regression equation for a single independent variable. Because the equation as a whole satisfies this mathematical criterion, the term partial regression coefficient is used to make elcar that it is the weight to he applied to an independent variable ( IV ) when one or more specified lV s are also in the equation. Thus

# n 2

indicates the weight to be given X ; when X 2 is also in the equation. B Y2 . 13 is the X 2 weight when X i a n d X 3 are in the equation. B YA. i2-^ is th e X 4 weight w h e n X ,,

X 2, and X 3 are also used in the equation for Y , and so on. The weights for the IV s taken together with A constitute the necessary constants for the linear regression equation. When the regression equation is applied to the IV values for any given observa­ tion i, the result w ill be an estimated value o f the dependent varia b le^ ,). For any given set o f data on which such an equation is determined, the resulting set o f Y{ values will be as close to the actual Yt values as possible, given a single weight for each IV . " A s close as possible” is defined by the least-squares principle. For our example of estimating salary (10 from number o f years since Ph.D. ( X ,) and number of publications ( X 2), the full regression equation is (3.2.2.)

Y t2 - $479 X, +

$88

X 2 + $20,138,

where S479 is the partial regression coefficient B yi , 2 for X , and S 8 8 is the partial regression coefficient

, for X 2. The redundancy o f information about Y

carried by these two variables is reflected in the fact that the partial regression coefficients ($479 and $ 8 8 ) arc each smaller in magnitude than their separate zero-order B ' s ($520 and $566). W e may interpret B r2. , = $ 8 8 directly by stating that, for any given number o f years since Ph.D. (X ,), on the average each additional publication is associated with an increase in salary o f only S 8 8 rather than the $566 that was found when years sincc Ph.D . was ignored. The B y , . 2 = S479 may be similarly interpreted as indicating that, fo r faculty members with a

given number o f publications ( X 2), on the average each additional year since Ph .D . is associated with an increase in salary o f $479 rather than the $520 that was found when number o f publications was ignored. From a purely statistical point of view , these changes are a consequence of the redundancy of the two causal variables [i.e., the tendency for faculty who have had their Ph.D .s longer to have more publications ( r n = .683)]; the partiailing process controls for this

84

3.

M RC : T W O O R M O R E IV S

tcndcncy . 4 View ed through the lens o f causal analysis, wc see (particularly in the ease o f number of publications) how seriously we can be misled about the causal import o f a variable when we fail to include in our model other important causes. Thus far, wc have simply asserted that the regression equation for two or more I Vs takes the same form as did the single IV ease without demonstrating how its coefficients are obtained. As in the case in presenting correlation and regression with one IV , we initially standardize the variables to eliminate the effects of noncomparable raw (original) units. The regression equation for standardized variables is (

3. 2. 3.)

Zy



2

P y |. z !

P

^2 sZ 2'

Just as rYK is the standardized regression coefficient for estimating zK from zx , P y ( ., and P r2.| arc the standardized partial regression coefficients for estimating z Y from Z| and z 2

minimum squared error. (D o not confuse this use o f p with

its use as rate o f Type 11 error in power analysis.) The equations for (

2

3. 2. 4 )

an^ f W i can

proved via differential calculus to be

P r, 2 =

~ Y ]2 1

I2

1

- r T~ T2

Y \ ' 12

A separation o f the elements o f this formula may aid understanding. r YI and r Y2 are “ valid ity” coefficients, that is, the zero-order correlations o f the I Vs with the dependent variable, Y. r2V2 represents the variance in cach IV shared with the other IV and reflects their redundancy. Thus, p y i2 and $ y2-\ are partial coefficients bccause cach has been adjusted to allow for the correlation between

X t and X 2. To return to our academic example, the correlations between the variables are

r Yi = .618, r Y2 — .461, and r I2 = .683. W e determine by 0 ^

! '2

E q . (3.2.4) that

_ .618 - (,461)(.683) _ I - .6832 ‘ /U’ _ .461 - (,618)(,683) _

Py2>

1-

,

6832

and that the full regression equation for the standardized variables is therefore = .570z, + .0727,.

4The terms holding constant or controlling fo r, partialling the effects of, or residuaiimig some other variabie(s) indicate a mathematical procedure, ofbourse, rather than an experimental one Such terms are statisticians' shorthand for describing the average effect of one variable for any given vaiucs of the other variables.

3.3 M E A S U R E S OF ASSO CIATIO N W ITH TW O IVS

Once p r

l .2

85

and p r2., have been determined, conversion to the original units is

readily accomplished by (3.2.5)

B y i .2 = P Y1.2 ^ .

R - ft Sd* BY2-1 P y 2'I Substituting the sd values for our running example (Tabic 3.2.1), we find ^

= .5 7 0 (^ |Z ).» 7 9

b™

=M

W

■ ss8

The constant A that serves to ad just for differences in means is calculated in the same w ay as with a single IV :

A y,a

^

^Yi-2^1

^V2-i^2

- $25,406 - 5479(9.60) - $88(7.60) = $20,138. The full (raw score) regression equation for estimating academic salary is therefore

Y 12 = $479 X , +

$88

X 2 + 520,138,

and the resulting values are provided in the third column of Table 3.3.1 later in the chapter. The partial regression coefficients, B Yl 2 = $479 and B y i .i = $ 8 8 , are the empirical estimates, respectively, o f/ and g, the causal effects of our indepen­ dent variables accompanying the arrows in the causal diagram (Fig. 3.1.1).

3.3 M EA SU R ES OF ASSO C IA TIO N W ITH TW O INDEPEN DENT VARIABLES Just as there are partial regression coefficients for multiple regression equations (equations for predicting Y from more than one IV ) , so are there partial and multiple correlation coefficients that answer the same questions answered by the

zero-order or simple product moment correlation coefficient in the single IV case. These questions include the following: 1. How w ell does this group of IV s together estimate Y1 2. How much does any single variable add to the estimation of Y already accomplished by other variables?

86

3.

MRC: TW O OR MORE IVS

3. When all other variables arc held constant statistically, how much of Y does a given variable account for? 3.3.1 Multiple R and R1 Just as r is the measure of association between two variables, so the multiple R is the measure of association between a dependent variable and an optimal com­ bination of two or more IVs. Sim ilarly, r2 is the proportion of each variable’s variance shared with the other, and R 2 is the proportion of the dependent vari­ able’s variance (i

$ Y 2

I r Y2

For the example illustrated in Table 3.1.1 the multiple correlation is thus, by Eq. (3.3.1),

v

/ .3824 + .2122 - 2(.6I8)(.461)(.683) 1 - .4667

= V.3852 = .621 or by Eq. (3.3.2), /?r ] 2

= V.570(.618) + ,072(.46I) = V.3 8 5 2 = .621.

(W e again remind the reader who chccks the previous arithmetic and finds it ‘'wrong” of our warning in Scction 1.2.2 about rounding errors.) W e saw in Chapter 2 that the absolute value of the correlation between two variables \rYX\ is equal to the correlation between Y and Yx . The multiple correla­ tion is actually definable by this property. Thus, with two IV s. (3.3.3)

*'yy*2'

and taking the example values in Table 3.3.! we see that indeed ryy t = .621 —

R y. l2- That rryt2 and hcnce

/ ? K. | 2 cannot be negative can be seen from the fact that by the least-squares criterion Y is as close as possible to Y.

3.3 M EA SU R ES OF ASSOCIATION WITH TWO IVS

87

TABLE 3.3.1

Actual Estimated, and Residual Salaries ] Y

4

5

2

3

y,

Y,2

$18,000 19.961 19.828 17.030 19,925 19.041 27,132 27,268 32.483 27,029 25.362 28.463 32.931 28.270 38.362

$20,931 21.451 23,012 24.053 25,614 22.492 21,972 24,573 22.492 28,735 28,215 30,297 24,573 27.695 34.979

$20,793 21.448 22,973 24,546 25.369 22,845 21,839 24.059 22,757 28,859 28,116 29,594 24.674 27,813 35.400

$2,793 - 1.487 -3.145 -7.516 -5,444 -3,804 - 5,293 i 3,209 1-9.726 - 1.830 -2.754 -1,131 - 8.257 +457 i-2.962

m $25,406 sd $6,097

$25,406 $3,770

$25,406 $3,784

$0 $4,780

r

7

6

%2.1

Yu

Xj

V2 |

3.58 4.04

- 1.58 -.04

5 45 6.38 7.79 4.98 4.51 6.85 4.98 10.60 10.13 12.00 6.85 9 66 16.21

.45 - 5.62 2.79 (-4.02 -1.51 5.85 l 3.02 +1.40 -1.13 -8.00 +1.15 t 1.34 -4.79

7 60

0

Y

Y,

-S2,93l - 1,490 -3,184 ■7.023 -5.689 -3.451 +5,160 +2,695 -9,991 -1.707 -2,853 1,834 +8,358 + 575 I 3,383 0 $4,791

Correlations Y, .618 = r,

Y Y

Y,

0 • '"tKin

The reader will again recall that

r,2 .621 - R r l2 1.349)

X2 *2-1 -052 - sr: .006 = p r i

is the proportion of variance of Y shared

with X. In exact parallel. /?p. , 2 is the proportion of s(fy shared with the optimally weighted composite of X , and X 2■These optimal weights are, of course, those provided by the regression equation used to estimate Y. Thus,

(3 .3 .4 )

12 3,7842 6,097’

.3852;

that is, some 39% of the variance in salary (JO is linearly accounted for by number of years since doctorate (X ,) and number of publications (X 2) in this sample. Again in parallel with simple correlation and regression the variance of the residual, Y — Yl2, is that portion o f sd* not linearly associated with X , and X 2. Therefore (and necessarily), (3 .3 .5 )

r Y (y

— 0

88

3.

MRC: TW O OR M ORE IVS

and sincc such variances are additive, (3.3.6)

= ^ 4 I2 + s d $ _ y l .

It should also be apparent at this point that a multiple R can never be less than the absolute value o f the largest correlation of Y with the IV s and is almost invariably larger. The optimal estimation o f Y l2 under circumstances in which X2 adds nothing to X , ’s estimation o f Y would involve a

Ky. p would equal

0

weight f o r ^

2

and thus

the absolute value o f ry i . Any slight departure o f X 2

values from this rare circumstance necessarily leads to some (perhaps trivial) increase in R r . , 2 over As with bivariate correlation the square root of the proportion o f Y variance not associated with the IV s is called the coefficient o f (multiple) alienation. This value is V i

— R 2 = V l — .3852 = .784 for these data.

3.3.2 S em ip a rtial C orrelation Coefficients One of the important problems that arises in M R C is that of defining the contri­ bution of each IV to the multiple correlation. W e shall sec that the solution to this problem is not so straightforward as in the case o f a single independent variable, the choice of coefficient depending on the substantive reasoning underlying the exact formulation o f the research questions. One answer is provided by the semipartial correlation coefficient sr and its square sr2. T o understand the mean­ ing o f these coefficients it is useful to consider the “ ballantine” . Recall that in the diagrammatic representation o f Fig. 2.6.1 the variance o f each variable is represented by a circle o f unit area. The overlapping area of two circles repre­ sents their relationship as r2. W ith Y and two IV s represented in this w ay. the figure is called a ballantine (or, more formally, a Venn diagram). The total area of Y covered by the A’, and X 2 areas represents the proportion of K’s variance accounted for by the two IV s . R^,.I2. Figure 3 .3 .1 shows that this area is equal to the sum o f areas designated a, b , and c. The areas a and b represent those portions of Y overlapped uniquely by IV s X j and X 2, respectively, whereas area c represents their simultaneous over­ lap with Y. The “ unique” areas, expressed as proportions o f Y variance, arc squared semipartia! correlation coefficients, and each equals the increase in the squared multiple correlation that occurs when the variable is added to the other IV

.5

/0

0 7

1

Thus, V

*

a ~ — ^K* 12 — r r2 ' b = sr% = R 2y.n ~ r 2 , .

’'Throughout the remainder of the book, whenever possible without ambiguity, partial coefficients are subscripted by the relevant independent variable only, it being understood that Y is the dependent variable and that all other i V s have been partial led Thus, v/, = w ,,, (11 4|. the correlation hetwecn Y and X, from which all other IVs in the set under consideration have been parti ailed Similarly, R without subscript refers to Ry. t t.

3.3 M EA SU R ES OF ASSOCIATION WITH TW O IVS

89

r2 r a V c

(3.3.7)

.V/-2 - ,2 1

V/-2 ‘ 2 (3.3.10)

r 2 : •h + c K-r-. .iz. = « + b I- '2

>-»2

r. r 2r(.Y] x,._,).

Another notational form of s r t used is rn i .2), the 1*2 being a shorthand way of expressing “ X , from which X 2 has been partiallcd,” or X , — X ,.2. It is a convenience to use this dot notation to identify what is being partialled from what, particularly in subscripts, and it is employed whenever necessary to avoid ambiguity. Thus i j means i from which j is partialled. Note also that in the literature the older term part correlation is sometimes used to denote semipartial correlation. In Table 3.3.1 we present th e X 2 - X 2., (residual) values forcach case in the example in which salary was estimated front publications and years sincc Ph.D. The correlation between these residual values and Y is seen to equal .052. which is sr2'■and .0522 = .0027 = sr\, as before. To return to the ballantine (Fig. 3.3.1) we sec that for our example, area a

— .0027. b ~ . 1730, and a { b + c = Rr-\2 = .3852. It is tempting to calculate c (by c = R y l2 ~ srf - sr%) and interpret it as the proportion of Y variance estimated jointly or redundantly by X , and X 2. However, any such interpretation runs into a serious catch— there is nothing in the mathematics that prevents c from being a negative value and a negative proportion of variance hardly makes sense. Because c is not necessarily positive, we forego interpreting it as a proportion of variance. A discussion of the circumstances in which c is negative is found in Section 3 .4. On the other hand, a and b can never be negative and are appropriately considered proportions of variance; each represents the increase in the proportion of Y variance accounted for by the addition of the corresponding variable lo the equation estimating Y.

3.3 M EA SU R ES OF ASSOCIATION WITH TWO 1VS

91

3.3.3 Partial Correlation Coefficients Another kind of solution to the problem of describing cach I V ’s participation in determining ft is given by the partial correlation coefficient p r,, and its square, p rf. The squared partial correlation may be understood best as that proportion of s d 2y not associated with X 2 that is associated with X ,. Returning to the ballantine (Fig. 3.3.1), wc sec that

nr 2^ ss 1

= a K’iI2 *

u

a+e

1

r Y2

■ r Y2

(3.3.10)

& = fir-12 ~ r Y1

nt-2 ~ P 2

b+ e

1 -

4 ,

'

The a area or numerator for pr] is the squared semipartial correlation coeffi­ cient srf: however the base includes not all the variance of Y as in sr] but only that portion of Y variance that is not associated with X 2, that is, I - r$2. Thus, this squared partial r answers the question, “ How much of the Y variance that is not estimated by the other IV (s) in the equation is estimated by this variable?” Interchanging X , and X2 (and areas a and l>), we similarly interpret pr\. In our academic rank example, we .see that by Kqs. (3.3.10) ,

.3852 - .2122

.1730

P * =~ r ~ - .46 l* ......... ^7878 ~ -2 I % 7

r \ =

.3852 - .3824 .0027 I - .6 | 8 2 = ^ T76 =

0044'

Obviously, because the denominator cannot be greater than 1, partial correlations will be larger than scmipartial correlations, except in the limiting case when other IV s are correlated 0 with Y, in which case sr = pr. pr may be found more directly as a function of zero-order correlations by o

,

(3.3.11)

r Yl ~ rY lr U

pr,

\/T ^ 7 W

j - r]2

r Y2 ~ r Y \ r n

pr2 V

I -

4

,\ A ]"- r ?2

For our example

pr'

.618 - .46K.683) V I - .2122 V I - .4667

_

and pr] = .469* = .2196, as before;

/” ’2

.461 - ,618(,683) _ V I - .3824 V I - .4667 '

and p r\ = .0662 = .0044, again as before.



92

3.

MRC: TWO OR MORE IVS

In Table 3.3.1, we demonstrate that pr-, is literally the correlation between X 2 from which X , has been partialled (i.e ., X 2 — X 2 i) and Y from which X : has also been partiallcd (i.e., Y — / ,). Column 6 presents the partialled X 2 values, the residuals from X 2 Column 7 presents the residuals from K, (given in coiumn 2). The simple correlation between the residuals in columns 6 and 7 is .066 = p r2 (the computation is left to the reader, as an excrcise). We thus see that the partial correlation for X 2 is literally the correlation between Y and X 2, each similarly residualized from X r A frequently employed form of notation to ex­ press the partial r is ry2. |, which conveys that X ( is being panialled from both Y and X2 [i.e., r(Y in contrast to the semipartial r, which is represented as

rY{2 -ir Before leaving Table 3.3.1, the other correlations at the bottom arc worth noting. The r of Y with Y t of .618 is identically r yl and necessarily so, since Kj is a linear transformation of X l and therefore must correlate cxactly as X | docs. Similarly, the r of Y with Y 12 of .621 is identically R y. , 2 and necessariiy so, by definition in Eq. (3.3.3). Also, Y ~ Y , (that is, Y - X ,) correlates zero with because when a variable (hereX j) is partiallcd from another (here Y), the residual will correlate zero with any linear transformation of the partiallcd variable: here, Y j is a linear transformation o f X , , i,e „ f , = B lX l + A Summarizing the results for the running example, we found = . 1730, pr\ = .2196 and sr\ = .0027, pr\ — ,0044. Whichever base we use. it is clear that number of publications (X 2) has virtually no unique relationship to salary, that is, no relationship beyond what can be accounted for by years since doctorate (X t). On the other hand, years since doctorate (X ,) is uniquely related to salary (sr,) and to salary holding publications constant (prt) to a quite substantial degree. The reader is reminded that this example is fictitious, and any resemblance to real academic departments, living or dead, is purely coincidental. Readers who find their plates overflowing with this smorgasbord of correlation (simple or zero-order, multiple, scmipartial, and partial) and regression (raw and standardized) coefficients may find some relief in the last section of this chapter, where they arc compactly summarized.

3.4

PATTERNS OF ASSOCIATION BETWEEN Y AND TWO INDEPENDENT VARIABLES

A solid grasp of the implications of all possible relationships among one depen­ dent variable and two independent variables is fundamental to understanding and interpreting the various multiple and partial coefficients encountered in M RC . This section is devoted to an exposition of each of these patterns and its distinc­ tive substantive interpretation in actual research. 3.4.1 Direct and Indirect Effects As we have stated, the regression coefficients f lt and fl 2 estimate the causal effects of X| and X 2 on Y in the causal model given in Fig. 3.4.1, Model A.

3.4 PATTERNS OF ASSOCIATION WITH TWO IVS

PartioI

redundancy :

Model

Full

93

Model

A

B

redundancy :

Model

C ! Spurious relationships

Model D J Indirect effect

Y FIG U RE 3.4.1

Representation of relationships between Y and two IVs

These coefficients, labeled/ and g in the diagram, arc actually estimates of the direct effects of X , and X 2, respectively. Direct effects are exactly what the name implies— causal effects that are not mediated by any other variables in the model. All causes, of course, are mediated by some intervening mechanisms. If such an intervening variable is included, wc have Model B shown in Fig. 3.4,1. In this diagram is shown as having a causal cffcct on X 2. Both variables have direct effects on Y. However, X 1 also has an indirect cffect on Y via X 2. Note that the difference between Models A and B is not in the mathematics of the regression coefficients but in the understanding of the causal proccss. The advantage of Model B , if it is valid, is that in addition to determining the direct effects of X, and ,V2 on Y, one may estimate the indirect effects of X { on Y

94

3.

MRC: TWO OR MORE IVS

as well as the cffcct of on X 2. This latter (h) in Model B is, of course, estimated by the regression coefficient of X 2 on X ,, namely ZJ2 I. The direct effects,/and g, are the same in both Models A and B and arc estimated by the sample regression coefficients for X, and X 2 from the equation for Y. The relationship between two exogenous variables, h in Model A, is conventionally represented by the correlation between the variables. The magnitude of the indirect effect of X , on Y in Model B may also be estimated by a method described in Chapter 9. We have included Models A and B under the rubric ‘"partial redundancy” as this is by far the most common pattern of relationship in noncxperimental re­ search in the behavioral sciences. It occurs whenever rYi > r¥2r l2 and r y2 > ryi r !2 Isee Eqs. (3.2.4), (3.3.8) and (3.3.11)], once the variables have been ori­ ented so as to produce positive correlations with Y. The at, and p, for each IV will be smaller than its r Yi (and will have the same sign) and thus reflect the fact of redundancy; each IV is at least partly carrying information about Y that is being also supplied by the other. This is the same model shown by the ballantinc, Fig. 3.3.1. Examples of Model A two-variabie redundancy come easily to mind. It occurs when one relates school achievement (JO to parental income (X ,) and education (X 2). or delinquency (y) to IQ ( X :) and school achievement (X 2), or psychiatric prognosis (7) to rated symptom severity (X ,) and M M P I Schizophrenia score (X 2), or— but the reader can supply many examples of his/her own. Indeed, redundancy among explanatory variables is the plague of our efforts to under­ stand the causal structure that underlies observations in the behavioral and social sciences. Mode! B two-variabie redundancy is also a very common phenomenon. Some substantive examples are given in Fig. 3.4.2. Here we see that age is expected to produce differences in physical maturity in a sample of school children and that each is expectcd to cause differences in heterosexual interest. Birth order of offspring is expected to produce differences in parental aspirations and hoth are causally related to achievement. W e might expect sex diffcrcnccs in interper­ sonal role preferences and that both of these variables will produce differences in carcer aspirations. Also, for our running example, wc expect the passage of years since Ph.D. to produce increases in the number of publications and increases in both of these variables to produce increases in salary. In each of these circumstances we expect the direct effects of the variables to be smaller than the zero order (unpartialled) cffects. In addition, we anticipate an indirect effect of our X { variables to take place via the X 2 variables. Although partial redundancy is the most commonly observed pattern for causal models A and B , it is not the only possible model. When any one of three correlations, r Yi, r Y2, or r !2 is less than the product of the other two, the relationship is what is commonly referred to as suppression. In this case the partiallcd coefficients c f X ( and X 2 will be larger in value than the zero-order coefficients and one of the partialled (direct cffect) coefficients may become negative.

3.4 PATTERNS OF ASSOCIATION WITH TWO IVS

Xi

Y

Age

95

Heterosexual interest

Physical m aturity

Xf

Y

Birth order

Achievement

"X 2 Parental aspirations

X{

Sex

Y

Career aspirations

Interpersonal role

X,j

Years since

Y

PhD

' Xz

X,

preferences

Salary

Publications

Tax c u ts --------------------------------- » - Y Economic growth

In flation FIGURE 3.4.2

lixamples of causal model B.

The term suppression can be understood to indicate that the relationship be­ tween the independent or causal variables is hiding or suppressing their real relationships with Y, which would be larger or possibly of opposite sign were they not correlated. In the classic psychometric literature on personnel selection, the term suppression was used to describe a variable (such as verbal ability) X2 which, although not correlated with the criterion Y, is correlated with the avail­ able measure of the predictor A", and thus adds irrelevant variance to it and reduccs its relationship with Y. The inclusion of the suppressor in the regression

96

3.

MRC: T W O OR M ORE IVS

equation removes (suppresses) the unwanted variance in X (, in effect, and en­ hances relationship between X , and Y by means of B Y i 2 . This topic is discusscd again in Chapter 9 in the context of measurement models. For a substantive example, suppose a researcher is interested in the roles of social aggressiveness and record-keeping skills in producing success as a sales person. Measures o f these two characteristics are devised and administered to a sample of employees. The correlation between the measure o f social aggressive­ ness ( X , ) and sales success (Y) is found to be +.29, the correlation between record keeping (X2) and Y + .24 and r l2 = —.30, indicating an overall tendency for those high on social aggressiveness to be relatively low on record keeping, although each is a desirable trait for sales success. Because —.30 < (.29) (.24) wc know that the situation is one of suppression and we may expect the direct effccts (the regression and associated standardized coefficients) to be larger than the zero-order effects. Indeed, the reader may confirm that the |3 coefficients arc .398 for social aggressiveness and .359 for record keeping, both larger than their respective correlations with Y, .29 and .24. The coefficients may be con­ sidered to appropriately reflect the causal effects, the zero-order effects being misleadingly small because o f the negative relationship between the variables. A Model B example o f suppression may be found in the (oversimple) eco­ nomic model shown in Fig. 3.4.2 in which tax cuts arc expected to produce increases in economic growth but also in inflation. Because inflation is expected to have negative effects on economic growth, one can only hope that the direct positive effect of the tax cuts on economic growth w ill exceed the indirect negative effect attributable to the effect on inflation. Suppression is a plausible model for many homeostatic mechanisms, both biological and social, in which force and counterforce tend to occur together and have counteractive effects. The fact that suppression is rarely identified in simple models may be due to the difficulty in identifying equilibrium points in timing the measurement of X ( , X 2, and Y. Suppression effects of modest magnitude are more common in complex models. Statistically significant suppression effects are likely to be found in aggregate data, where the variables are sums or averages o f many observations and R 2s arc likely to approach J bccausc of She small error variance that results in these conditions. 3.4.2 Spurious Effects and Entirely Indirect Effects Model C in Figure 3.4.1 describes the special ease in which r y2 =

This

model is of considerable interest because it means that the information with regard to Y carried by X 2 is completely redundant with that carried by X , . This occurs whenever the B, sr, and p r coefficients for X 2 are approximately zero. This occurs when their numerators are approximately zero (i.e ., when r y2 ~~

i’l2r y t). For the causal model the appropriate conclusion is that X 2 is not a cause of Y at all but merely associated (correlated) with Y because o f its association with X , . (B u t note the appropriate considerations before drawing such a conclu­

3.5 MRC W ITH k IVS

97

sion from sample results, as discussed in Section 3.7.) A great many analyses are carricd out precisely to determine this issue— whether some variable has a de­ monstrable non-zero effect on Y when correlated variables are held constant or, alternatively, whether the variable’s relationship to Y is spurious. Thus, for example, a number of investigations have been carried out to determine whether there is a family size (X 2) influence on intelligence (Y) independent of parental social class (X ,), whether maternal nutrition (X 2) has an effect on infant behavior (Y) independent of maternal socioeconomic status ( X , ), whether the status of women {X 2) in various countries has an cffect on fertility rate (K) independent of economic development (X |), or indeed whether any of the X 2 cffects shown in Fig. 3.4.2 are nil. Generally, the question to be answered is the “ nothing but” challenge: “ Is the relationship between Y and X 2 nothing but a manifestation of the causal effects of X , ? ” Complete redundancy, however, does not always imply a spurious relation­ ship. In Fig. 3.4.1. Model D we see a situation in which the partial cocfficicnts for X, approach zero, indicating correctly that there is no direct effect of X t on Y. There is, however, an indirect effect that, according to the model, takes place entirely via X 2 (i.e., is mediated by X 2). Many investigations are designed to answer questions about intervening mech­ anisms— for example, is the higher female (X ,) prevalence of depression (K) entirely attributable to lower female income/opportunity structure (X 2)? Are ethnic (X ,) differences in achievement (Ki entirely due to economic deprivation (X 2)? Is the demonstrable effect of poor parent marital relationship (X ,) on delinquency (Y) entirely attributable to poor parent-child relationships (X 2)? In these cases the relationships between X, and Y cannot be said to be spurious but are nevertheless likely to have different theoretical implications and policy im­ port when they are entirely redundant than when they are not. As in the case of the comparison of Models A and B , the difference between Models C and D lie not in the coefficients but in one's understanding of the causal processes that gave rise to the coefficients. Again, one can only demon­ strate consistency of sample data with a mode! rather than prove the model’s correctness.

3.5. MULTIPLE REGRESSION/CORRELATION WITH k INDEPENDENT VARIABLES

3.5.1. Introduction When more than two IVs are related to Y, the computation and interpretation of multiple and partial coeffieicnts proceed by direct extension of the two-IV case. The goal is again to produce a regression equation for the k IVs of the (raw score) form

98

3.

(3.5.1)

MRC: TW O OR M O RE IVS

Y - B y , .2X..k-Xl + &Y2-13...fc-^2 + By3-1 2...k.Xi + ' + ^Yk-\2i. k

\ %k + ^ Y - U i . k ,

or, expressed in simpler subscript notation.

Y = B }X t + B 2X 2 + B 3X 3 + . . . + BkX k + A. When this equation is applied to the data, it yields a set of Y values (one for each of the n cases) for which the sum of the O' — )02 values over all n eases will (again) be a minimum. Obtaining these raw-score partial regression weights, the Bj, involves solving a set of k simultaneous equations in k unknowns. In keeping with the conviction of the authors that carrying out the complex computations involved is not required for a solid w'orking knowledge of M R C , these pro­ cedures will not be described here. However, the interested reader is referred to Appendix 2 for a description and worked example of the M R C solution that is feasible for hand (by which we mean calculator as opposed to computer) calcula­ tion when there are not more than five or six IVs. Readers familiar with matrix algebra may turn to Appendix 1 for a presentation of the general solution for determining the multiple and partial coefficients for the k variable case. The most frequent method of obtaining these coefficients in the scicntific community at large is by means of one of the many widely available computer programs. A discussion of the use of the computer and of the most popular “ canned" programs for M R C can be found in Appendix 3. The purpose of this section is to lay down a foundation for understanding the various types of coefficients produced by M R C for the general case of k independent variables, and their relationship to various M R C strategies appropriate to the investigator's research goals. 3.5.2 Partial Regression Coefficients B y direct extension of the one- and two-IV cases, the raw score partial regression coefficient (= B yi.l2?i (i) k) is the constant weight by which each value of the variable X, is to be multiplied in the multiple regression equation that includes all k IVs. Thus, B is the average or expected change in Y for cach unit increase in X f when the value of cach of the k — 1 other IVs is held constant. p( is the partial regression coefficient when all variables have been standard­ ized. Such standardized coefficients are of interpretive interest when the analysis concerns test scores or indices whose scaling is arbitrary. For example, let us return to the study in w'hich we seek to account for differences in salary in a university department by means of characteristics of the faculty members. The two IVs used thus far were the number of years since each faculty member had received a doctoral degree (X ,) and the number of publica­ tions (Y-,). W e now' wish to consider two additional independent variables, the sex of the professor and the number of citations of his or her work in the scientific literature in the previous year. These data are presented in Table 3.5.1, where sex (A\) is coded (scored) I for female and 0 for male faculty. The

3.5 M R C W IT H k IV S

99

TABLE 3.5.1 Illustrative Data: Academic Salary and Four Independent Variables Base Salary Subject

Years Since Ph D

Nu of Publications

Sex

No. of Citations

(A-,)

IX 7)

(X ,)

1 2

Si 8.000 19,96j

1 2

2 4

3 4 5

19.828 17,030 19,925

5 7 10

5 12 5

6 7 8

19,041 27,132 27,268

4 3 8

9 3 1

9

32.483 27,029 25,362

4 16 15

8 12 9

0 0 1 0

12 n 14 15

28,403 32,931 28.270

19 8 14

4 8 11

38.362

28

21

m sd

$25,406 56,097

10 11

9.60

Y *2

618 .461 - 262

*4 Y

.507 .701

0 0 1 1 0

1 0

$22,380 21.879

1 0 0

21,135 21,806

(1 1

1 0 1

24,400 24.490 19,052

(1 4

24,256 23,169 29,(126

0 0 0

0 3 5 (1

26,569 30,299 30.068 26,628

0

3

35,926

7.60 4.96

7.25

Y

.267

1 27

S25,406

.442

161

S4.271

X, 1.000 .683 -.154

Xi 683 1.000 049

.154 .049 1.000

X, .460 .297 -.006

460

.297

- .006

1(too

correlation matrix shows us that sex is negatively correlated with salary O y , = — .262), women ( X 3 = I) having lower salaries on the average than men (X , = 0). The number o f citations in the current literature (X 4) is positively associated with salary (rYll = .507). as well as with the other IV s exccpt sex. Sex correlates very little with the other IV s , except for a slight tendency for the women to be more recent P h .D .'s than the men (r,^ = —.154). The (raw-score) multiple regression equation for estimating academic rank from these four IV s that may be obtained from computer output (Appendix 3) or by the matrix inversion method of Appendix 2 (where this problem is used illustratively) is

Y = S293 X , + $176 X 2 - $2945 X-, + S I 145 X d - $20,590. These partial /i, coefficients indicate that for any given values of the other IVs: an increase of one in the number o f citations is associated with a salary increase of

100

3.

M RC: TW O OR M O RE IVS

$1,145 (= fi4)\ an increase of one unit in X v and hence the average difference in salary between the sexes (holding constant the other IVs) is — S2945 (favoring men); and the effects of an additional year sincc degree (X,) and an increase of one publication (X 2) are S293 and S I 76, respectively. Note also that A = 520,590 is the estimated salary of a hypothetical male professor fresh from his doctorate with neither publications nor citations, that is, all Xt = 0. In this problem, the salary estimated by the four IVs for the first faculty member (Table 3 .5 .1) is

Y = $293(1) + $176(2) - $2945(0) + $1145(1) + $20,590 = $293

+ $352

-

0

+$1145

+$20,590

= $22,380. The remaining estimated values are given in the last column of Table 3.5.1. (Although we report the B, and A to at least the usual number of placcs, if readers check these Y values they w ill nevertheless find small rounding discrepancies; recall the warning of Scction 1.2.2.) The regression equation may be written in terms of standardized variables and P coefficients as

zY = ,348r, + . I43z2 - .214z3 + ,303z4. The (J values may always be found from B values by inverting Eq. (3.2.5):

(3.5.2) SCJy for example, (34 = (1145) (1.61/6097) = .303. 3.5.3 R and R2 Application of the regression equation to the IV s would yield a set of estimated Y values. The simple correlation of Y with Y equals the multiple correlation; in this example. r YY = R = .701. As with the one or two independent variable case, R 2 is the proportion of Y variance accounted for and R2 = sd^\s■; expressing the correlation of Y with X t from which all other IVs have been partiallcd, and pr, expressing the correlation of Y with X, when all other IVs have been partial led from both X, and Y. Because neither can equal zero unless the other is also zero, it is not surprising that they must yield the same t, value for the statistical significance of their departure from zero. It should also he clear that (3,, which has the same numerator as sr, and pr,, also equals zero only when sr, and

pr, do and, because B, is the product of p, and the sdy/sd,-, it also can equal zero only when they do. Thus, Hq. (3.6.6) and its equivalent Eq. (3.6.7) provide the appropriate F- and t, values for the significance of departures of all the partial coefficients of X t from zero. They either arc, or arc not, all significantly different from zero, and to exactly the same degree.

108

3.

M R C : T W O O R M O R E IV S

For example, let us return to the running example where the obtained R z of .4908 was found to be significant (P < .01) for k = 4 and n = 50. The ,sr, for the four IV s were, respectively, .228, .102, — .206, and .268. Determining their I values of Hq. (3.6.7) we find -

.228

/ 50 - 4 - I _

V —

2.142

1 - .4908

=

.102

= -.2 0 6

=

4 - 1 _ 50~ 1 - .4908

.959

/ 50 - 4 — 1 _ V — 1 - .4908

-1.932

v

.268 \

/ 50 - 4 - !

2.517.

1 - ,4908 Looking these values up in the t table (Appendix Tabic A ) for 45 d f we find that t l and r4 exceed the value required for significance at a = .05; however neither the i for number of publications nor for sex reaches the value required. W c conclude that years since Ph.D . and citations both make statistically signifi­ cant unique (dircet) contributions to salary. W c may not reject the null hypoth­ eses that sex and number o f publications have no unique (direct) relationship to salary in the population once the cffccts o f years sincc Ph.D . and citations are taken into account. Jt is quite possible to find examples where R 2 is statistically significant but none of the tests o f significance on the individual X l rcachcs the significance criterion for rejecting the null hypothesis. This finding occurs when the variables that correlate with Y are so substantially redundant that none of the unique cffccts is large enough to be significant. On the other hand, it may also happen that one or more of the t tests on individual variables docs reach the criterion for signifi­ cance although the overall R 2 is not significant. The variance estimate for the regression based on k IV s is divided by k to form the numerator o f the F test for R 2, making o f it an average contribution per IV . Therefore, if most variables do not account for more than a trivial amount of Y variance they may lower this average (the mean square for the regression) to the point of making the overall F not significant in spite of the apparent significance of the separate contributions o f one or more individual IV s . In such circumstances, wc recommcnd that such IV s not be accepted as significant. The reason for this is to avoid spuriously significant results, the probability of whose occurrcncc is controlled by the requirement that the F fo ra set o f IV s be significant before its constituent IV s arc

t tested. This, the “ protected t test” , is part o f a general strategy for statistical inference that is considered in detail in the next chapter (Section 4.6).

3.6 T E S T S OF SIG N IFICA N CE WITH k (VS

109

3.6.4 Standard Errors and Confidence Intervals for B and p and Hypothesis Tests In Chapter 2, we showed how to determine standard error and confidencc inter­ vals for r and B in the two-variable case, providing that certain distributional assumptions are made. Similarly, one may determine standard errors for partial regression coefficients; that is, one may estimate the sampling variability of partial coefficients from one random sample to another, using the data from the single sample at hand. The equation for estimating the standard error of B is particularly useful be­ cause it reveals very dearly what conditions lead to large expected sampling variation in the size of B, and hence in the accuracy one can attribute to any given sample £ value. A convenient form of the equation for the standard error of B for any X ( is (3.6.8)

where R \ is literally R y . !2 k, and R f ' m literally R f . i2 (i) k. The ratio of the «/s, as always, simple adjusts for the scaling of the units in which X, and Y are measured. Aside from this, we see from the second term that the size of the SEU will decrease as the error variance proportion (I — R \ ) decreases and its d f { — n - k — 1) increase. (On reflection, this should be obvious.) Note that this term will be constant for all variables in a given regression equation. The third term reveals an especially important characteristic of SEn , namely, that it increases as a function of the squared multiple correlation of the remaining IVs with X t, R f . Here we encounter a manifestation of the general problem of multicollinearity, that is, of substantial correlation among IVs. Under conditions of multicollinearity there will be relatively large values for at least some of the SEn , so that any given sample may yield relatively poor estimates of some of the popula­ tion regression coefficients, that is. of those whose R f arc large. In order to show this relationship more dearly it is useful to work with vari­ ables in standard score form. B , expressed as a function of standard scores is p,. The standard error of 3, drops the first term from (3.6.8) because it equals unity, so that (3.6.9) To illustrate the effects of differences in the relationships of a given X : with the remaining IVs, we return to our running example presented in Tables 3.5.1 and 3.5.2. In this example, number of publications and number of citations had very

110

3.

M R C : T W O O R M O R E IV S

similar zcro-order correlations with salary, .461 and .507, respectively. Their relationship with the other I V s , especially years since Ph.D . differed substan­ tially, however, with publications correlating .683 and number o f citations cor­ relating .460 with years. The squared multiple correlation with other IV s is .4919 for number of publications and .2177 for number o f citations. Substitut­ ing these values into Eq. (3.6.9) wc find

.1064(1.4030) = .149,

J

1 - .4908

V

45

I

I

1

\

-

.2177

= .1064(1.1306) = .120. Thus we can see that the redundancy with other variables has not only reduced the (3 for publications (. 143) as compared to citations (.303) but also has made it a less reliable estimate of the population value. In contrast the p for Sex (.214), although smaller in absolute size than that for citations, has a smaller S E (. 111) because sex shared only .0742 of its variance with other IVs. The t distribution may be used to test the null hypothesis for a (3/t that is, the hypothesis that its population value is zero. It takes the usual form:

(3.6.10}

{

d

f

=

n

-

k

-

I).

{),

Applying this test to p PUB and (Jcrr■, w c find

These ncccssarily are identical with the /’s determined for the corresponding

a t ,.

(Section 3.6.3); recall that all partial coefficients fo rX , (prjt srjt fl, and fl,) must share the same t (or F) value. One may also use the SE to determine the bounds within which we can assen with 9 5 % confidence that the population (3 falls. Bccause the t value for d f = n —

k — 1 = 45 for a two-tailed probability o f .05 is 2.014 (by interpolation in Appendix Table A ), w c expect the population value to fall within the interval that extends 2.014 S E p on either side o f (J. In the given example, the limits of the 9 5 % eonfidenec interval for (Jc rr w ill be from .303 — 2.014(.120) = .088 to .303 + 2 .0 14(.120) = .545. The reader may determine that the 9 5 % confi­ dence interval for p PUB extends below zero, as indicated by the failure to achieve significance at the .05 criterion.

3.6 T E S T S O F SIG N IFIC A N C E W IT H k IV S

Confidence intervals may be similarly determined for B coefficients. For ex­ ample, wc found Bcrr ~ $1145. Bccause the .WSAl_ = $6097 and the .vc/CiT = 1.61 (Table 3.5.1), we can find the SE for fiCII by rescaling the SE for ficlT , cr



csl s a l e rr ~77i---^ fk x r s a CYT

=

(.120, ^ 5 4 5 5

(continuing to assume n — 50), or directly by means of Eq. (3.6.8). W c can determine the 9 5 % confidence interval for flciT to be S I , 345 ± 2.014 ($455), that is, from $227 to 52,061. (Note that these limits arc very wide; see Section 3.6.5.) Confidence limits may also be determined for prj by cxtention of the procedure described in Section 2.10, by applying the Fisher z' transformation to the pr with a standard error of I / V n — k — 2, In addition to the determination of confi­ dence limits of pr, such standard errors may be used to test the significance of the difference between pm from independent samples, exactly as with zero-order r's. It is possible to test the nu!! hypothesis that two independent W,s (i.e., coming from different samples, 1 and 2) are equa! by utilizing their respective standard errors. Determine each by means of Eq. (3.6.8) for its sample R].. Rf, n, etc., and substitute in

(3.6.11)

8 , 1

v " S'F2

~

8 , 2

+ SF2

This is a large sample (approximate) test; z is referred to the normal curve table (Appendix Table C ) in the usual way. The test for the difference in standardized partial p, proceeds in exactly the same way, utilizing the two standard errors as found from Eq. (3.6.9). Replace

B }s by f}(-s in Eq. (3.6. I I ) , substitute, solve for z, and refer to the normal curvc table as before. These tests are not appropriate when the two coefficients being compared, either B t and BJ or [S, and come from the same sample. This test requires the covariance of the pair of coefficients, found from the inverse o f the matrix, and is therefore deferred to Appendix 2, where the computation of the latter is described. 3.6.5 Use of M ultiple Regression Equations in Prediction As we have already noted, the traditional use of M R C in the behavioral sciences has been for prediction, literally forecasting, with only incidental attention to explanation. In this book, the emphasis is reversed, and our almost exclusive concern is with the analytic use of M R C to achieve the scientific goal of explana-

112

3.

MRC: TW O OR MORE IVS

tion. But in its use for purposes of prediction, M R C plays an important role in several behavioral technologies, for example, personnel (including educational) selection, vocational counseling, and psychodiagnosis. In this section we address ourselves to the accuracy of prediction in multiple regression and some of its problems. The standard error of estimate, sdY. Y, as wc have seen, provides us with an estimate of the magnitude of error that we can expect in estimating Y values over sets of future X , , X 2> • ■ ■ , X* values that correspond to those of the present sample (that is, the fixed-effects model). Suppose, however, we wish to determine the standard error and confidence limits of a single estimated Y0 from a new set of observed values , Xko. In Section 2.10.2, we saw that the expected magnitude of error increases as the X !o values depart from their respective means. The reason for this should be clear from the fact that any discrepancy between the sample estimated regression coefficients and the popu­ lation regression coefficients w ill result in larger errors in Y0 when X ,„ values are far from their means than when they arc close. Estimates of the standard error and confidence limits for Ylt predicted from known values X lo, X 2o, . . . , X ki, are particularly dependent upon the validity of a rather strong set of assumptions about the nature of the populations from which the sample observations are drawn. The accuracy of the estimate of the standard error of a given Y„ value depends on whether the population Y values for any given set of X,. are normally distributed, centered on the population regression surface, and have equal variances from set to set of X y values. Under these circumstances, the standard error of a Yt> predicted from given values of X ,„,

x 2„> • • • ’ X ko is ¥ivcn by (3.6.11)

s~dYn yn = sdy V fl

* a/ n + I + 2

^

I -

- 2 X P

Rf

1-

Rf '

where the first summation is over the k IV s , the second over the k ( k - l)/2 pairs o f IV s (i.e., i < j) expressed as standard scores, is the (3 for estimating A, from Xj, holding constant the remaining k — 2 IV s , and R f is literally Although at first glance this formula appears formidable, a closer

Rf.l2

examination w ill make clear what elements effect the size of this error. sdY Y is the standard error of estimate and as in the case of a single IV , we see that increases in it and/or in the absolute value of the IV ( z i(,) w ill be associated with larger error. The terms that appear in the multiple IV case that did not appear in the single variable ease (p,;-and R f) are functions of the relationships among the independent variables.14When all independent variables arc uncorrelated (hence, all p(/ and all

Rf equal

zero), we see that the formula simplifies and sdYt) Yo is

minimized (for constant sd Y_ Y, n, and zio values). It is worth emphasizing the distinction between the validity of the significance tests performed on partial coefficients and the accuracy of such coefficients when

8A version of Iicj (3.11) using elements of the inverted mulrix may be found in Appendix L

3.6 TESTS OF SIGNIFICANCE WITH k IVS

113

used in prediction. In analytic uses of M RC , including formal causal analysis, given the current level of theoretical development in the behavioral and social sciences the information most typically called upon is the significance of the departure of partial coefficients from zero and the sign of such coefficients. The significance tests are relatively robust to assumption failure, particularly so when n is not small. Using the regression equation for prediction, on the other hand, requires applying these coefficients to particular Xm values for which the conse­ quence of assumption failure is likely to be much more serious. Further insight may he gained by noting that regardless of the sign, magnitude, or significance of its partial regression coefficient, the correlation between X, and the Y determined from the entire regression equation is (3.6.12) ” Y‘ 12... k Thus it is invariably of the same sign and of larger magnitude than its zero-order r with Y. Reflection on this fact may help the researcher to avoid errors in interpreting data analyses in which variables that correlate materially with Y have partial coefficients that approach zero or are of opposite sign. When partial coefficients of the Xt approximate zero, whatever linear relationship exists be­ tween X, and Y is accounted for by the remaining independent variables. Because neither its zcro-ordcr correlation with Y nor its (larger) correlation with Y is thereby denied, the interpretation of this finding is highly dependent on the substantive theory being examined. Even without a full causal model a weak theoretical model inay be employed to sort out the probable meaning of such a finding. One theoretical context may lead a researcher to conclude that X is only spuriously related to Y, that is, related only because of its relationships with other IVs. Another theoretical context may lead to the conclusion that the true causal effect of Xj on Y operates fully through the other IVs in the equation. Similarly, when the partial coefficients of X, and r Yi are of opposite sign, Xi and one or more of the remaining IVs are in a suppressor relationship. Although it is legitimate and useful to interpret the partialled relationship, it is aiso important to keep in mind the zero-order correlations of X, with Y (and hence with Y).

Cross-Validation and Unit Weighting Several alternatives to regression coefficients for forming weighted composites in prediction have been recently proposed (Darlington, 1978; Dawes, 1979; Dawes & Corrigan, 1974; Green, 1977; Waincr, 1976, 1978). Although p weights are guaranteed to produce composites that arc most highly correlated with zY (or Y) in the sample on which they are determined (namely, /?K), other weights produce composites (call them u Y) that are almost as highly correlated in that sample. “ Unit weighting,” the assignment of the weights of +1 to positively related, - I to negatively related, and 0 to poorly related IVs are popular candidates— they are simple, require no computation, and are not subject to sampling error (Green, 1977; Mostcller & Tukey, 1977; Wainer, 1976). For

114

3.

MRC: TWO OR MORE IVS

our running example on academic salary, we simply add (i.e., weights of +1) the z scores of each subject for years, publications, and citations and subtract (i.e ., weight of — 1) his/her z score for sex to produce the composite u Y for each subject.9 W e find that uY correlates .990 with the p-weighted (or Y), and therefore (not surprisingly) .693 with z Y (or Y), only slightly less than the .701 (= R y) of zr with zy (or K) However, the real question in prediction is not how well the regression equa­ tion determined for a sample works on that sample, but rather how well it works in the population or on other samples from the population. Note that this is not the estimate of the population, Ry [i.e., the shrunken R2 of Hq. (3.6.4)j, but rather of the “ cross-validated” r\y for the sample P 's , which is even more shrunken and which may be estimated by ( 3. 6. 13)

(Rozeboom, 1978). R 2 answers the relevant question, ‘‘If 1 were to apply the sample regression weights to the population, or to another sample from the population, for what proportion of the Y variance would my thus-predicted Y values account?” Now, for our running example, even assuming n = 50, our sample regression equation would yield R2 = 1 — (1 — ,4908)(50 + 4)/(50 - 4) = .4022, so R = .634. W e found above, however, that the unit-weighted composite for the cases we have yielded an r of .693, greater than R. Now this value is subject to sampling error (so is R), but not to shrinkage, because it does not depend on unstable regression coefficients. As far as we can tell, unit weights would do as well or better in prediction for these data than the sample’s regression weights, assuming n = 50. (For n ~ 15, R = .347, and we would certainly be better off with unit weights.) Unit weights have their critics (Pruzek & Frederick, 1978; Rozeboom, 1979). For certain patterns of correlation (suppression is one) or a quite large n:k ratio (say more than 20 or 30) unit weights may not work as well in a new sample as the original regression coefficients will. An investigator who may be in such a circumstance is advised to compute R and compare it with the results of unit weighting in the sample at hand.

S o m e Recent Weighting Methods for Prediction Although no weights can compctc with unit weights in simplicity, many other procedures for generating weights have recently been proposed that may be superior to regression weights under certain circumstances. [Dempster, Schatzoff, & Wermuth, (1977) list some 56 varieties of alternative weighting 9For raw scores, we would need (o divide our weighls of + I or - I by (he sd, for the X, io which the weight is attached (e.g.. i i/4.96 for publications and -S/.442 for sex).

3.6 T ES T S OF SIGNIFICANCE W ITH k IVS

115

methods). Prominent among these arc ridge regression and Stein-type regression, called biased or reduced variance regression (Darlington, 1978; Darlington & Bo ycc, 1982), and component regression, in which the largest principal compo­ nent factors of the IV correlation matrix are used in pla.ee of the k IV s (La w ley & M axw ell, 1973). These methods do not seek, to produce unbiased estimates of the population coefficients (only M R C regression coefficients do that), but rather estimates of Y in new samples with less error than those of M R C . They are demonstrably superior to regression weights for this purpose when the n.k ratio is "s m a ll” (say, less than 10 or 20), but they may be inferior to unit weights (Cattin, 1981). In general, as n increases (other things equal), the reduccdvariance methods’ weights approach M R C weights. But wc stress the fact that, again like unit weights, they are to be used only for purposes of forcasting Y and not to carry the information of functional relationships. W c thus agree with Darlington (1978) who writes "th e new techniques are very useful for pure prediction, as in personnel selection, but inappropriate for path analysis, or other forms o f causal analysis (p. 1239).” Given the emphasis of this book, this is a thicket wc need not enter. 3.6.6 M u ltico llin earity The existence o f substantial correlation among a set of I V ’s creates difficulties usually referred to as "th e problem of rnulticollinearity." Actually, there arc three distinct problems— the substantive interpretation o f partial coefficients, their sampling stability, and computational accuracy.

Interpretation W e have already seen in Section 3.5 that the partial coefficients of highly correlated IV s analyzed simultaneously are reduced. Because the IV s involved lay claim to largely the same portion o f the Y variance by definition, they can not make much by w ay o f unique contributions. Interpretation o f the partial coeffi­ cients of IV s from the results of a simultaneous regression o f such a set of variables that ignores their multicollinearity w ill necessarily be misleading. Attention to the R j o f the variables may help, but a superior solution requires that the investigator formulate some causal hypotheses about the origin of the multicollinearity. I f it is thought that the shared variance is attributahlc to a single central property, trait, or latent variable, it may be most appropriate to combine the variables into a single index or drop the more peripheral ones (see Section 4.6.2) or even to turn to a latent variable causal model (sec Section 9.5). If. on the other hand, the investigator is truly interested in each of the variables in its own right, analysis by a hierarchical procedure may be employed (see Section 3.8.1). T o be sure, the validity o f the interpretation depends on the appropriate­ ness of the hierarchical sequence, but this is preferable to the complete anarchy o f the simultaneous analysis in which everything is partialled from everything else indiscriminately, including cffccts from their causes.

116

3.

MRC: TWO OR MORE IVS

Sampling Stability The structure of the formulas for SE,): (Eg . 3.6.8) and S E „ (Eq. 3.6.9) makes plain that they are directly proportional to V 1/(1 — ft,2). A serious consequence of multicollinearity, therefore, is highly unstable partial coefficients for those IVs that arc highly multicollinear. Concomitantly, the trustworthiness of indi­ vidually predicted Y0 is lessened as the K 2s for a set of IVs increase, as is evident from the structure of Eq. (3.6.11). Large standard errors mean both a lessened probability of rejecting the null hypothesis (see Section 3.7.3) and wide confi­ dence intervals.

Computation As the Hjs approach 1.00, errors associated with rounding in the computation of the inverse of the correlation matrix among iVs (Appendix 1) become poten­ tially serious and may result in partial coefficients that are grossly in error. Fonunately, recent improvements in computer capabilities for carrying many digits in their computations and in their computational recipes (algorithms) are likely to cope adequately with all but the most extensive instances of multi­ collinearity. When in doubt, seek expert advicc at the computer center. (Also see Appendix 3.)

3.7 POW ER AN ALYSIS 3.7.1 Introduction Section 2.9.1explained the purpose and desirability of determining the power of a given research plan to reject at the a significance level a false null hypothesis that the population r equals zero. Thus, given a plan for determining the exis­ tence of nonzero correlation between two variables, including n and a , the investigator may enter the table for the selected a with n and the expected (alternate-hypothetical) value of the population r, and read off the power— the probability of finding the sample r to be significant. Alternatively, one may proceed in planning a research by deciding on the significance criterion « and the desired power. Then, having specified the expected population r, a table lor the given « is entered with this r and the desired power. The tabled values provide the number of cases necessary («*) to have the specified probability of rejecting the null hypothesis (the desired power) at the a level of significance when the population r is as posited. In this section, wc extend power analysis beyond simple correlation to the more general M RC for k IVs. 3.7.2 Power Analysis for R2 Power and n* can be conveniently tabled for the single IV case. However, the several different coefficients that may be tested in M KC analysis as well as provision for many possible values of k makes the direct tabling of power and n* unwieldly. Instead, we provide a table of constants with which one can perform

3.7 PO W ER A N A L Y S IS

117

power analysis o f tests for the different null hypotheses in M R C with k IVs. These constants are then employed in a simple formula to determine the neces­ sary number of cases (/?*). T o determine n* for the F test o f the significance of R 2, the researcher pro­ ceeds with the following steps: 1. Set the significance criterion to be used, a . Provision is made in the Appen­ dix for a = .01 and a = .05 in the L tables (Appendix Tables E . l and E.2 ). 2. Set desired power for the F test. The L tables provide for power values of .10, .30, .50, .60, .70, .75, .80, .85, .90, .95, and .99. (The use o f the lower values is illustrated in Chapter 4.) 3. In the L tables (Appendix Tables E. i and E .2 ), kn is used to represent the number o f d f associated with the source of Y variance being tested. For R j -, 2 k,

kl} is simply k, the number of IV s. The L tables provide for kg = 1 ( I ) 16 (2) 24 (4) 40 (10) 100, that is, for 30 values o f kB between 1 and 100. 4. Look up in the appropriate table (a = .01 or .05) the vaiue of L for the given kB (row ) and specified power (column). 5. Determine the population effect size, E S (= / 2, see following) o f interest, and the expected or alternate-hypothetical value. A s was the ease for the single IV (where E S

= r), the E S may represent a probable population value as

indicated by previous work, a minimum value that would be o f theoretical or practical significance, or some conventional value as discussed in Section 4.5.4. The population HS for R 2 is given by (3.7.11 6. Substitute L (from step 4) and f 2 in

(3.7.2}

„»=A + *+]. /2

The result is the number o f cases necessary to have the specified probability of rejecting the null hypothesis (power) at the a level of significance when f 2 in the population is as posited. For example, let us return to the research on acadcmic salaries. As part of the planning preceding the research, the investigator performs a power analysis for

R 2 in order to determine the n* to be used. It is planned to use the a = .05 significance criterion (Step I) , to have a .90 probability of rejecting the null hypothesis (Step 2) and to use four independent variables (Step 3). Checking Appendix Table E.2 ( a = .05) for kB = 4 (row ) and power = .90 (column), the L value is found to be 15.41. It is decided that a population R 2 as small as .10 would be o f interest and thus the HS is determined to be

118

3.

MRC: TW O OR M O RE IVS

Substituting L and/2 in Eq. (3.7.2)

n* =

15 41

+ 4 + 1 = 144

Thus, 144 cases are needed to detect (using a = .05) a population R 2 as small as . 10 with a 90% probability. If the researcher were content to be able to detect a population R 2 of .20 with the same power and at, only 67 cases would be necessary. Similarly, suppose another investigator feels that .40 is a more realis­ tic value for the population R 2, is content with 80% power, but plans to use the more stringent .01 significance criterion. In this case, the Table E .l value fori, is 16.75 and/ 2 = .40/.60 = .6667; thus, from Eq. (3.7.2), only [(16.75/.6667) + 4 + I = | 30 eases are needed. (Note how grossly insufficient the original n = 15 is in the light of the power analysis.) 3.7.3

Power Analysis fo r Partial Correlation and Regression Coefficients

A determination of the number of cases necessary for testing the null hypothesis that any partial correlation or regression coefficient for a given X, (in a set of k IVs) is zero may proceed in the same manner as for R 2. Having set a and the desired power, the appropriate table is entered now for ks = 1 (because the source of variance is a single X ;), and the L value is read off. The/2 value for the partial coefficients of a single IV X,-, is determined by (3 .7 .3 )

f 2=

s r'

1~ R 2 The L and/2 values arc substituted in Hq. (3.7.2) to determine n * . For example, in planning the study on academic salaries a researcher may expect the population R 2 to be about .40 and decide to determine n* for the case in which cach of the 4 IVs makes a unique contribution of .04 — sr2. (Note that to the extent that IVs are cxpected to be somewhat redundant in accounting for Y, the sum of the si'2, will typically be smaller than R 2.) The researcher then proceeds by deciding that the significance criterion is to be a = .05 and that the power desired is .80. Checking Appendix Table H.2 for kH = 1 and power = .80, he finds the L value to be 7.85. Determining from Eq. (3.7.3) that / 1 = .04/(1 — .40) = .0667, the L and f 2 values are substituted in Eq. (3.7.2), and 7 ' 85

.0667

+

4

+

1

= 123.

Thus, according to this plan it will take a sample size of 123 to provide an 80% probability of rejecting the null hypothesis at a = .05 if the population f 2 is as posited. It is useful to note here the substantial effects of redundancy among the IVs in reducing power (or increasing n*). If the four IVs were each to account

3.7 P O W E R A N A L Y S IS

119

uniquely for one-fourth of R 2, that is, if they were uncorrelated, the f 2 for each would equal .1667 (= . 10/.60), and n* = 52. (Again note the insufficiency of n = 15.) Although the power analysis of partial coefficients proceeds most conveniently by determining/2 by means of sr 2, the analysis provides the appropriate power to reject the null hypotheses that p , B, and pr arc zero as well. Because, as we have seen, these coefficients for a given X, must have identical significance test results, the power to reject the null hypothesis for any one of them will be the same as for any other for analogous alternative-hypothetical values. It sometimes happens that an investigator finds it more convenient to think in terms of units of change in the dependent variable that would be significant than in terms of proportions of variance. For example, the fictitious research de­ scribed as our running example may have been motivated by a desire to deter­ mine whether salary discrimination on the basis of sex existed in the population. It might be decided that any discrepancy as large as $1000 in annual salary (net of the effects attributable to other causes) would be material to the people involved. The researcher may know that about 30% of faculty members arc women; thus the sd of sex will be about V .30(.70) = .458 in the sample. The sd of faculty salaries may be determined from administration records to be about $6000. Thus, if B = $1000, J3 = S 1000 (.458/56000) = .0763. Recognizing that correlation with other variables will reduce the sr relative to the p [see Hq. (A2.4), Appendix 2j, the researcher decides that an appropriate sr2 to use for the power analysis would be .0712 = .005. B y Eq. (3.7.3) and assuming R2 = .40, P = .005/(1 - .40) = .00833. Using the given L value of 7.85 for a = .05, power = .80, k = 1, we find (to our dismay) 7 RS .00833

+ 4 + I = 947!

If this n seems very demanding, it is instructive to note that in the example as given, assuming n = 50, with a net difference of nearly $3000 (BsliX = $2946) the researcher is in the embarrassing position of concluding that what is surely a personally significant difference is not statistically significant— that is, does not reliably indicate a nonzero difference in the population. The preceding procedure may be employed for power analysis whenever an ES expressed as a B or p can be specified more readily than a desired proportion of unique variance (sr2). Several other topics in power analysis arc presented in Chapter 4, following the exposition of power analysis in the most general form of M R C , where multiple sets of IVs are used. Among the issues discussed there arc determination of power for a given n (Section 4.5.8), reconciling different n *s for different hypotheses in a single analysis (Section 4.5.6), and the considerations involved in setting/2 and power values (Sections 4.5.4 and 4.5.5). Section 4.5.9 dis­ cusses some general tactical issues in power analyses.

120

3.

MRC: TWO OR M ORE IVS

3.8 ANALYTIC STRATEGIES Until this point we have presented rcgrcssion/corrclation analysis as if the typical investigation proceeded by selecting a single set of IV s and producing a single regression equation that is then used to summarize the findings. Life, however, is seldom so simple for the researcher. Nor should it be. There is a wealth of information about the interrelationships among the variables not extractable from the single equation. It is. perhaps, the skill with which other pertinent informa­ tion can be ferreted out that distinguishes the expert data analyst from the novice. 3.8.1 Hierarchical Analysis One of the most useful tools for extracting information from a data set is hier­ archical analysis. In its simplest form, the k IVs are entered cumulatively in a prespecified sequence and the R2 and partial coefficients are determined as each IV joins the others. A full hierarchical procedure for k IVs consists of a series of k M R C analyses, each with one more variable than its predecessor. (In Chapter 4 we sec that one may proceed hierarchically with sets of variables rather than single variables.) Because with each new variable the R2 increases, the ordered scries of R2f> in hierarchical analysis is callcd the cumulative R2 series. The choice of a particular sequence (hierarchy) of IVs is made in advance (in contrast with the stepwise regression” in Section 3.8.2), dictated by the purpose and logic of the research. Some of the hasic principles underlying the hierarchical order for entry arc causal priority and the removal of confounding or spurious relationships, research relevance, and structural properties of the research factors being studied.

CausaI Priority and the Removal of Confounding Variables As seen (Section 3.4), the relationship between any variable and Y may be spurious (i.e., due to one or more variables that arc a cause of both). Thus, each variable should be entered only after other variables that may be a source of spurious relationship have been entered. This leads to an ordering of the vari­ ables that reflects their presumed causal priority— ideally, no IV entering later should be a presumptive causc of an IV that has been entered earlier. A major advantage of the hierarchical M RC analysis of data is that once the order of the IVs has been specified, a unique partitioning of the total Y variance accounted for by the k IVs, R$ ,2 k, may be made. Indeed, this is the only basis on which variance partitioning can proceed with correlated IVs. Because the srf at each stage is the increase in R2 associated with X, when all (and only) pre­ viously entered variables have been panialled. an ordered variance partitioning procedure is made possible by

(3.8.1)

R\.n ^ k - r Y 2 l + 4 (2M)+ 4 (3.,2)+ 4(4.m ) + -■• + >$(*.,2 3 ...*-,) = fy j + sr\. 1 +W3.12 +4'r|. 123 + ' ••+W#>123.,.fc—: •

3.8 ANALYTIC ST R A T E G IE S

121

tiach of the k terms is found from a simultaneous analysis of IVs in the equation at that point in the hierarchy; cach gives the increase in Y variance accounted for by the IV entering at that point beyond what has been accounted for by the previously entered IVs. r \ x may be thought of as the increment from zero due to the first variable in the hierarchy, an sr2 with nothing partialled. Summing the terms up to a given stage in the hierarchy gives the cumulative R 2 at that stage, for example, + s r + sr^.l2 — R \ .i2v The reader is reminded that the increment attributable to any IV may change considerably if one changes its position in the hierarchy, because this will change what has and what has not been partialled from it. This is indeed why one wishes the IVs to be ordered in terms of causal priority— otherwise part of the variance in Y due to some cause is instead attributed to an IV that is an effect of this cause. This stolen (spurious) variance will then mislead the investigator about the rela­ tive importance to Y of the cause and its effect. Generally speaking, one is likely to have one or a small subset of IVs that are the focus of the investigation. For these variables an appropriate conservative sequencing would include all variables that may contribute to them causally before adding these focal variables to the equation. Likely candidates for causal priority in behavioral studies are status variables— age, sex, ethnicity, education, and socioeconomic status— because these are temporally prior and unlikely to be affected by more transitory states or traits. Of course, it will frequently not be possible to posit a single sequence that is uncontroversially in cxact order of causal priority. In such circumstances more than one order may be entertained and the results then considered together. They may not differ with regard to the issue under investigation, but if they do, the resulting ambiguity must be acknowledged. When the variables can be fully sequenced— that is, when a full causal model can be specified that does not include any reciprocal causation, feedback loops, or unmeasured common causes, the hierarchical procedure becomes a tool for estimating the effects associated with each cause. Indeed, this type of causal model is sometimes called a hierarchical causal model. O f course, formal causal models use regression cocfficicnts rather than variance proportions to indicate the magnitude of causal effects. Because Chapter 9 is devoted to an exposition of the techniques associated with this and other types of causal models we do not describe them here. However, it should be noted that even without a fully specified model, the hierarchical procedure is useful for extracting as much causal inference as the data allow (see also Section 9.2.3). To illustrate a hierarchical analysis, organized in terms of causal priority, we turn again to the academic salary data. The order of assumed causal priority is sex (X s), years since Ph.D. (X ,), publications (X 2), and citations (X 4). Note that no variable can be causally affected by one that appears after it: whatever causality occurs among IVs is from earlier to later in the sequence. W c enter these variables in the specified order and determine the R 2 after each addition. W e found rYlt — -.262, and therefore R \ ( —rf-?) = .0688 (i.e., some 1% of

122

3.

M R C : T W O O R M O R E IV S

the academic salary variance is accounted for by sex). When years since Ph.D . ( X ,) is added to sex, we find that R \,3j = .4111 and may say that the increment in Y variance o f years since Ph.D . over sex, or for years panialling sex, is .Try. 3 = R y -i i ~ = -4111 - .0688 = .3423. Next we add publications ( X 2) and find R y - it 2 = -4191, a very small increment: = R f .312 — R $ 3i = .4191 — .4111 =.0080. Finally, when citations ( X 4) is added, we have the R 2 we found in Section 3.5 that R y .3iZ4 = .4908, so the incrementfor X 4 or ir | . 3|2 = .4908-.4191 = .0717 The final R 2 for the four IV s is necessarily the sum of these increments, by Eq. (3.8.1);



JrT-1

S^Z-3!

57%312

.4908 = .0688 + .3423 + .0008 + .0717 = sex + years + pub. + cit. O f coursc, a different ordering would result in different increments (which would also sum to .4908), but to the extent that they violated the direction of causal (low , the implications would be spurious. Fo r example, if entered first, publications would have .2122 of the salary variance credited to it, but only on the unlikely premise that years since Ph.D . (essentially age) did not cause numhcr of publications. The causal priority ordering makes it clear that the strong relationship between salary and publications merely reflects the operation of the passage o f time. The increments here arc s r 2 values, but they arc different from those deter­ mined previously (Section 3.5.4) and given in Tahle 3.5.2. For the latter, all the other k — 1 (= 3) IV s were partialled from each, whereas here only those preceding each IV in the hierarchy arc partialled. (They therefore agree only for the variable entering last.) When the significance test of Eq. (3.6.6) is employed for these cumulative sr2 values (for n ~ 50), it is found that all are significant except the sr1 for number of publications. (A test using a different error model is described in Section 4.4.1.) A special case of the hierarchical model is employed in the analysis of change. Under circumstances in which pre and post values are availahle on some variable and the researcher wishes to determine whether and to what extent treatment or other variables are associated with change, the postscore may be used as the dependent variable, with prcscorc entered as the first IV in the hierarchy. Unlike the alternative method involving difference (post- minus prc-) scorcs, when subsequent IV s are entered into the equation their partial correlations w ill reflect their relationship with postscores from which prcscore influence has been re­ moved. (This is, in fact, an A C V accomplished by means of M R C . Chapter 10 provides a full discussion of A C V , including its use in the study o f change in Section 10.6.)

Research R elevance Not infrequently an investigator gathers data on a number o f variables in addition to those IV s that rcflcct the major goals o f the research. Thus, X i and X 2

3.8 ANALYTIC STRATEGIES

123

may carry the primary focus of the study but X 4, and X s are also available. The additional IVs may be secondary because they are viewed as having lesser relevance to tbc dependent variable than do X , and X 2, or bccause hypotheses about their relationships are weak or exploratory. Under these circumstances, X, and X 2 may he entered into the equation first (perhaps ordered on the basis of a causal model) and then X 3, X 4, and X s may follow, ordered on the basis of their presumed relevance and/or priority. Aside from the clarity in interpretation of the influence of X , and X 2 that is likely to result from this approach (because the secondary X 3, X 4, and X 5 variables arc not partialled from X, and X 2), the statistical power of the test of the major hypothesis is likely to be maximal when the appropriate error model is used (see Sections 4.4.1 and 4.6.3).

Hierarchical Analysis Required b y Structural Properties Several types of variahlcs that may be used as IVs in M R C have characteristics that make assessment of their contribution to R2 meaningful only after related variables have been partialled, thus mandating a specific order. This occurs in the representation of interactions (Chapter 8), curvilinear relationships (Chapter 6), and missing data (Chapter 7). Because the exposition of these methods requires entire chapters, they will not be illustrated here; however, they will be found to entail important applications of the hierarchical procedure. In general, the moral of hierarchical analysis is that the contribution to R 2 associated with any variable may depend critically upon what else is in the equation. The story told by a single simultaneous analysis for all k variables may, for many purposes, be incomplete. Hierarchical analysis of the variables typ­ ically adds to the researcher’s understanding of the phenomena being studied, hecause it requires thoughtful input by the researcher in determining the order of entry of IV s and yields successive tests of the validity of the hypotheses that define that order. 3.8.2 Stepwise Regression Although stepwise regression has certain surface similarities with hierarchical M R C , it is considered separately, primarily because it differs in its underlying philosophy, and also hccause special computer programs and options are avail­ able for its computation. As discussed here, these programs are designed to select from a group of IVs the one variable at each stage that has the largest sr2, and hence makes the largest contribution to R 2. Such programs typically stop admitting IVs into the equation when no IV makes a contribution that is statis­ tically significant at a level specified by the program user.10 Thus, the stepwise procedure defines an a posteriori order based solely on the relative uniqueness of the variables in the sample at hand. ,0Somc stepwise programs operate backwards, that

i$,

by elimination. All k (V.s arc entered

simultaneously and the one making the .smallest contribution is dropped. Then the k — I remaining variables arc regrcs.scd on Y, and agajn the one making the smallest contribution is dropped, and so on. The output js given in reverse order of elimination. This order need not agree with that of the forward or accretion method described here.

124

3.

MRC: TWO OR MORE IVS

When an investigator has a large pool of potential IV s and very little theory to guide selection among them, these programs are a sore temptation. If the com­ puter selects the variables, the investigator is relieved of the responsibility of making decisions about their logical or causal priority or relevance before the analysis, although interpretation of the findings may not be made easier. W c take a dim view of the routine use of stepwise regression in explanatory research for various reasons (see following), but mostly because we feel that more orderly advancc in the behavioral sciences is likely to occur when researchers armed with theories provide a priori hierarchical ordering that reflects causal hypotheses rather than when computers order IVs post and ad hoc for a given sample. An option that is available on some computer programs allows for an a priori specification of a hierarchy among groups of IVs called “ forced” stepwise regression. An investigator may be clear that some groups of variables are logically, causally, or structurally prior to others, and yet not have a basis of ordering variables within such groups. Under such conditions, variables may be labeled for entering in the equation as one of the first, second, or up to hlb group of variables, and the sequence of variables within each group is determined by the computer in the usual stepwise manner. This type of analysis is likely to be primarily hierarchical (between classes of IVs) and only incidentally stepwise (within classes), and computer programs so organized may be effectively used to accomplish hierarchical M R C analysis by sets of IVs as described in Section 4.2.2. Probably the most serious problem in the use of stepwise regression programs arises when a relatively large number of IVs is used. Because the significance test of an I V ’s contribution to R 2 proceeds in ignorance of the large number of other such tests being performed at the same time for the other competing IVs, there can be very serious capitalization on chance. The result is that neither the statistical significance tests for each variable nor the overall tests on the multiple R 1 at each step are valid.1! A related problem with the free use of stepwise regression is that in many research problems the ad hoc order produced from a set of IVs in one sample is likely not to be found in other samples from the same population. When among the variables competing for entry at any given step there arc trivial differences among their partial relationships with Y. the computer will dutifully choosc the largest for addition at that step. In other samples and, more important, in the population, such differences may well be reversed. When the competing IV s are substantially correlated with each other, the problem is likely to be compounded, because the losers in the competition may not make a sufficiently large unique contribution to be entered at any subsequent step before the problem is terminated by nonsignificance. Although, in general, stepwise programs are designed to approach the max­ imum R 2 with a minimum number of IVs for the sample at hand, they may not 1'The computer program manuals conscientiously noie this fact, with little apparent effect on the behavior of users. However, one can validly appraise the stepwise R 2 {but not the stepwise w-i for significance using tables provided by Wilkinson I

3.9 THE REGRESSIO N MODEL AND A N A LYSIS OF RESID U A LS

125

succeed very well in practice. Sometimes, with a large number of IV s. variables that were entered into the equation early no longer have nontrivial (or significant) relationships after other variables have been added. Some programs provide for the removal of such variables, but others do not. Also, although it is admittedly not a common phenomenon in practicc, when there is suppression between two variables neither may reach the criterion for entrance to the equation, although if both were entered they would make a useful contribution to R 2. However, our distrust of stepwise regression is not absolute, and decreases to the extent that the following conditions obtain: 1. The research goal is entirely or primarily predictive (technological, and not at all, or only secondarily, explanatory (scientific). The substantive interpreta­ tion of stepwise results is made particularly difficult by the problems described above. 2. n is very large, and the original k (that is, before stepwise selection) is not too large; a kin ratio of 1 to at least 40 is prudent. 3. Particularly if the results are to be substantively interpreted, a cross-valida­ tion of the stepwise analysis in a new sample should be undertaken, and only those conclusions that hold for both samples should be drawn. Alternatively, the original sample may be randomly divided in half, and the two half-samples treated in this manner.

3.9 ADEQUACY OF THE REGRESSION MODEL AND ANALYSIS OF RESIDUALS Whether a multiple regression analysis has been employed in the service of a complex causal analysis or for a prosaic piece of psychotechnological forecast­ ing, the analysis is not really complete until the analyst has addressed the ques­ tion, “ How adequate is the actual regression model I employed to the task of describing my data and projecting this description to the population?” This question is not easily answered, in general, nor are there mechanical procedures that can be applied in a routine manner to assure a correct answer. As in data analysis generally, insight based on experience, sometimes even inspiration, is required to address the issue adequately. The purpose of this scction is to suggest some tools that may help. (A more detailed exposition of this topic is available in Anscombe & Tukcy, 1963.) A model is not adequate simply because Rz is high nor inadequate because R 2 is low. It is inadequate rather when it can be substantially improved within the conditions set by the available data. The major sources of inadequacy are failure to provide for curvilinearity, the existence of outliers, heteroscedasticity, and the omission of important independent variables. Although some of these problems have been briefly discussed for the bivariate case (Sections 2 .8 .1, 2.10.1), they require attention for the more general case of k IVs.

126

3.

MRC: TWO OR MORE IVS

3.9.1 The Analysis of Residuals The residuals ( “ errors” ) of the estimated values of the regression (i.e., the n values of Y — Y) provide the basis for assessing the adequacy of the model. Recall that the mean of the residuals equals zero, their correlation with Y equals zero, and with Y equals V 1 — R 2. Their variance is what is minimized by the least-squares solution for the regression constants— an absolute measure of total error of estimation that, when divided by the variance of Y, equals ! - R2. Certain features of the residuals’ distribution as related to the estimated Y values or to the values of other variables by graphie and other techniques are diagnostic of flaws in the particular regression model that are often capable of correction. Graphic methods are very useful (and much used) in the analysis of the ade­ quacy of a regression model, and most computer software packages provide not only for the output of the n residuals but also for graphic plotting of the residuals against Y and other variables (Appendix 3). What the study of graphs may lack in objectivity, it makes up in the clarity with which it may identify the problem. More objective analytic techniques may also be applied to the residuals to assess specific sources of model inadequacy. Because of the physical limitations imposed by the size of printout paper and type, computer-generated graphs may lack clarity when sample sizes are large (say, n > 200). Under these circumstances, a few nonoverlapping random sam­ ples of 100-150 cases may be plotted and should give clear and consistent results. Alternatively, the total sample may be partitioned into subsamples, sepa­ rately plotted and then combined by hand into two-way frequency tables. Once the analyst understands how graphs are used, the means for generating them (or their tabular equivalents) will readily suggest themselves. The four graphs of Figure 3.9.1 provide rather stylized illustrations, respec­ tively, of the flaws of curvilinearity, outliers, heteroscedasticity, and omission of an important variable. Each is discussed both from a graphic and formal analytic perspective,

Curvilinearity Even the experienced investigator may fail to anticipate and provide for the curvilinearity of the regression on Y of one or more of the IVs in an analysis. When the Y — Y residuals are plotted against the estimated ( “ fitted” ) values Y, a graph like that of Fig. 3.9, la may result. Although the mean of the residuals and their r with Y are (as they must be) zero instead of a more or less uniform band of positive and negative residuals running from low to high values of Y , we observe that the negative residuals predominate at low and high values of Y, and the positive residuals mostly occur at middle values. This clearly eurvilincar rela­ tionship implies that an IV making a major contribution to Y is curvilinearly related to Y, and revising the regression model to make provision for this cur­ vilinearity would result in a substantial reduction in the variance of the residuals, thus an increase in R 2 and a far better model. Sueh a finding can be followed up

a,

C u rv ilin e a rity

b.

O u tlie rs

O

o

+

Y-Y

Y-Y

0 *

0



•* p ■.

■* ♦ ♦ * * * * ♦ * . ♦« ,, , * ♦. • ♦

% *

0 Y

y

c.

H e te ra s c e d a s tic ity

d. O m is s io n

of

tim e

+ +

Y-Y

*

0



? 127

FIG U R E 3.B.1

Y-Y

0

T im e

(o r

o rd er)

Illustration of four types of model inadequacy. By plotting residuals.

o

o rd er

128

3.

MRC: T W O OR M ORE IVS

by plotting residuals against each of the quantitative IV s to pinpoint the source of the curvilinearity. Formal procedures for testing for curvilinearity, as well as various methods for incorporating curvilinearity in the regression model, are the topic of Chapter 6 and are not pursued here.

Outliers Outliers are " f a r out” observations— in the present context, extreme residuals, either positive or negative. The circled points in Fig. 3.9. lb are outliers. There is considerable white spacc between the circled points and the body o f the residuals. A formal definition of an outlier is necessarily arbitrary. W hen residuals are standardized by dividing them by their standard deviation (the square root of the residual M S of Eq. 3.7.2), a residual that is as much as three (or, certainly, four) of these units in absolute size is reasonably considered an outlier. Because the regression surface as defined by the regression equation minimizes the squared residuals, an outlier not only makes a relatively large contribution to their variance (thus reducing R z ) but also exerts a disproportionately strong pull on the regression. They are, therefore, particularly bothersome when they are all or predominantly of the same sign. Deciding what to do with outliers is no easy matter. They incur suspicion that they arose from some causal proccss different from that operating on the bulk of the data, usually an error in an instrument or in observation or recording. If there is evidence that such is the case, the outliers may be dropped and the data reanalyzed. The decision to do so should not be taken lightly. It pays to think through carefully the question of how an outlier may have come about without the presumption of error, because such an exercise may produce insight into the phenomena under study. Even when error can be assumed, if outliers arc few (less than ! % or 2% o f n) and not very extreme, they are probably best left alone.

He teroscedas tic ity W hen the scatter o f the residuals is not constant over the range of Y values, that is, under conditions of heteroscedasticity, two kinds of consideration arise. The first is that the validity of significance tests, which assume constant error vari­ ance {and normality), are adversely affected. The overall variance of the re­ siduals (sdy _ y) is an average value that overstates errors of estimation for some values o f Y and understates it for others, thus playing hob with confidence intervals for Y0 computed in the usual way. The second consideration is that this total error variance is probably larger than it need be. If the variance is stabilized, the new model w ill likely more adequately describe the data, as evidenced by an increase in R 2. A requently observed form of heteroscedasticity is the fan shape illustrated in Fig. 3.9. Ic , where the residual variance increases regularly with Y. Note how

3.9 THE REGRESSION MODEL AND ANALYSIS OF RESIDUALS

129

much greater the expected error is in estimating large values of Y than in estimat­ ing small ones. When the residual variance varies regularly (as in Fig. 3.9.1c. or its mirror image), it can be assessed formally by stripping the residuals of their signs and computing the r or rank correlation (Eq. 2.3.9) between Y and these absolute residuals. When an irregular variance pattern is observed in a residual plot, differences in the residual variances between different ranges of Y can be as­ sessed by the standard F test for comparing independent variances (Hays. 1981). When the circumstances depicted in Fig. 3.9.1c obtain, the model is often greatly improved by simply rescaling Y by a nonlinear transformation, of which Y' = W or log Y arc good candidates; the data are reanalyzed with Y' replacing Y (Section 6.5). Sometimes a nonlinear transformation of one or more I Vs will work well. For irregular heteroseedastieity or when a more exact control is desired, the method of weighted least squares is available (Draper & Smith, 1981) but at the cost of some complexity. Although it is convenient to discuss types of model inadequacy separately, in real data they frequently occur together. Most particularly, curvilinearity fre­ quently accompanics hcterascedasticity and fortunately when one of these flaws is corrected, the other tends also to be (Section 6.5.1).

Omission o f an Important Independent Variable Any IV that has been included in a regression must have a linear correlation of zero with the residuals. The other side of this coin is that a variable V that has not been included in the equation estimating Y may have any degree of linear correla­ tion with the residuals. If this correlation ^ v) is substantial, the addition of V to the other IVs will then substantially reduce the variance of the residuals and increase R 2. (The amount of the increase, rfY_ r|V, is the semipartial r 2 of V with K from which the original IVs have been partialled.) In exploratory studies, if the possibility of curvilinearity with an omitted vari­ able V is entertained, one can plot the residuals against V. The linear relationship is easily enough ascertained by computing r(r y)V as above. Easier still, V can be added to the prior set of variables and the increment in R2 determined and tested in the usual way (Section 3.6.4). Figure 3.9. id illustrates a plot of a set of residuals against time or order of data acquisition. The latter was chosen because of the frequency with which it turns out to be related to Y, so that when it has been overlooked and thus omitted from the regression, it proves to be related (linearly or otherwise) to the residuals. Such a relationship is worth uncovering if for no other reason than that the subsequent addition of order to the regression reduces the error variance and hcncc increases the precision of estimates and the power of statistical tests. To facilitate such analysis it is good research practice to assign the identifying ease numbers in the order of case acquisition. Case number can then be simply used as a variable for graphing or computing.

130

3.

MRC: TWO OR MORE IVS

A potentially serious problem exists when time or order is also related to one or more independent variables. Under these circumstances, with time omitted, the observed relationships may be partly or wholly spurious (i.e.. a causal model in which time plays its proper role may yield a substantially different picture of the effects of the IVs in the analysis). In the specialized applications of M R C to the analysis of time series, problems of nonindependence of residuals due to autocorrelation (the similarity of each residual to the one next in order) arise. The diagnosis and treatment of these problems is beyond the scope of this presentation. A relatively accessible treat­ ment is provided by Neter and Wasserman (1974, pp. 352-366), and by Glass, Wilson, and Gottman (1975).

3.10 SUMMARY This chapter begins with the representation of the theoretical rationale for the analysis of multiple independent variables by means of causal models. The employment of an explicit theoretical model as a working hypothesis is advo­ cated for all investigations except those intended for simple prediction. After the meaning of the term cause is briefly discussed (Section 3.1.1) rules for diagram­ matic representation of a causal mode! are presented (Section 3.!.2). Bivariate linear regression analysis is extended to the case in which two or more independent variables (IV s ), designated Xj (i = 1 , 2 . . . ., k) are linearly related to a dependent variable Y. As with a single IV , the multiple regression equation that produces the estimated Y is that linear function of the k IVs for which the sum over the n cases of the squared discrepancies of Y from Y, 2 (K — Y)2. is a minimum. The regression equation in both raw and standardized form for two IVs is presented and interpreted. The standardized partial regression coefficients, p,, arc shown to be a function of the correlations among the variables; (j( may be converted to the raw score /?, by multiplying each by sdyisd, . (Section 3.2) The measures of correlation in M R C analysis include: 1. R, which expressed the correlation between Y and the best (least-squares) linear function of the k, IVs (? ). and R 2, which is interpretabie as the proportion of Y variance accounted for by this function. (Section 3.3.1) 2. Semipartial correlations, sr,-, which express the correlation of X: from which the other IV s have been partialled with Y. sr2 is thus the proportion of variance in Y uniquely associated with X h that is, the increase in R- when X t is added to the other IVs. The ballantine is introduced to provide visual representa­ tion of the overlapping of variance with Y of X t and X 2. (Section 3.3.2) 3. Partial correlations. p rt, which give the correlation between that portion of Y not linearly associated with the other IV s and that portion of X, that is not

3.10 S U M M A R Y

131

linearly associated with the other IV s ; in contrast with sr,, it partials the other IV s from both X ; and Y. p r j is the proportion of Y variance not associated with the other IV s that is associated with Xr (Section 3.3.3) Each of these coefficients is exemplified, and shown to be a function of the zero-order correlation coefficients. The reader is cautioned that none of these coefficients provides a basis for a satisfactory >' variance partitioning scheme when the IV s are mutually correlated. The alternative causal models possible for Y and two IV s are discussed, ex­ emplified. and illustrated. The distinction between direct and indirect effects is explained, and models consistent with partial redundancy between the IV s are illustrated. Mutual suppression of causal cffccts w ill occur when any of the three zero order correlations is less than the product o f the other two. (Section 3.4.1). Spurious effects and entirely indirect effects can be distinguished when the causal sequence of the IV s is know'n, (Section 3.4.2.) The case of two IV s is generalized to the ease o f k IV s in Section 3.5. The use of the various coefficients in the interpretation of research findings is discussed and illustrated with concrete examples. The relationships among the coefficients are given. R may be tested for statistical significance by means of an F test (Section 3.6.1). Since the R 2 obtained on any sample uses the optimal linear function of the sample X t values, it follow's that it tends to overestimate the population value. The “ shrunken” R 2 provides a more realistic estimate o f this va!ue. Overestimation of the population R 2 by the sample R 2 is larger as the ratio o f k to n increases. W hen IV s have been selected post hoc from a larger potential set. as in stepwise regression, even this shrunken estimate is too large. (Section 3.6.2) A ll partial coefficients share a single t value for the statistical significance of their departures from zero. W ith two or more IV s, it is possible for R to be significant when no IV yields significant partial coefficients and it is also possi­ ble for the partial coefficients of one or more IV s to be significant when R is not. (Scction 3.6.3) Standard errors may be determined for if, and (3, and used for significance testing and to set confidence limits for population values. These standard errors arc shown to increase as a function of the multiple correlation of Y, with the other IV s. They may also be used to test for the significance o f the difference between two independent fis or |3s. (Section 3.6.4) When the regression equation is to be used in estimating a single Yn value from a new set o f observed values of X Ut, X2rl, . . . . X ktl, the standard error of the K , value may be determined assuming homoscedasticity and normality. This stan­ dard error also increases as a function of the correlations among the IV s , as well as o f the departure o f the X in values from their respective means. The general issue o f the use of M R C in prediction is discussed. Regression equations used for prediction require cross validation. In many real problems equal weights may

132

3.

MRC: TWO OR MORE IVS

perform better than sample regression weights in a new sample. Other alterna­ tives to regression weights are available that may producc less error, although more bias, than those of M R C . (Section 3.6.5) Large correlations among IVs {multicollinearity) may creatc problems in in­ terpretation, sampling stability, and computational accuracy. These arc dis­ cussed, and means of coping with them are suggested. (Section 3.6.6) In planning a study using M R C it is highly desirable to determine the neccssary sample size to attain some given level of statistical power, n*. Such power analyses may be carried out for R2 and for the partial coefficients of each of the IVs. Methods of determining h* and worked examples are provided. (Section 3.7) The choice of the analytic model will determine the amount of information extracted from a data set. Hierarchical analysis allows appropriate consideration of causal priorities and removal of confounding variables. It may also be used to reflcct the research relevance or structural properties of variables (Section 3.8.1). An alternative strategy, “ stepwise” M R C , in which IVs are entered in a sequence determined by the size of their increment to R 2 is also discussed. Use of this strategy is generally discouraged because of the necessarily post hoc nature of interpretation of the findings and the substantial probability of capitalizing on chance. (Section 3.8.2) Methods for examining the adequacy of the regression by the analysis of residuals are presented in Section 3.9. Graphic methods are often helpful in identifying curvilinearity, outliers, heteroscedasticity, or omission of an impor­ tant variable.

4 Sets of Independent Variables

4.1 IN TR O D U C T IO N

In Chapter 3, we have taken the basic ideas of M R C about as far as they go in the standard textbook treatments. For (in principle) any number k of IVs, wc have discussed their joint relationship to Y (Ry, R the generation of the regression equation for Y), the various conceptions of the separate relationship of each Xt to Y (r Yi, stjt pri7 Bj, p;, r~., s r f ,p r f ) , and significance testing and power analysis for these statistics. In the present chapter, wc offer an expansion of these ideas from k single IVs to h sets of IVs. It turns out that the basic concepts of proportion of variance accounted for and of correlation (simple, partial, scmipartial, multiple) developed in Chapter 3 for single IVs hold as well for sets of IVs. W c shall see that this generalization, that is, the use of sets as units of analysis in M RC , proves to be most powerful for the exploitation of data, and is at the corc of our expansion of M R C from its limited past role in psychotechnology (for example, predicting freshman grade point average) to a truly general data-analytic system. What is a set? As used here, its meaning is essentially that in common dis­ course— a group classified as belonging together for some reason. Although we tend to think of a group or set as having plural membership, and this will usually be the case in our applications, the mathematical concept, being general, permits a set to have any number of constituent members, including one or zero. Sets containing only one IV , and sets containing none (empty sets) will be seen to specialize the concepts to cases described in Chapter 3. We represent sets by capita! letters (e.g., A, B ), the number of variables in a set by k subscripted with the letter (e.g., kA, kR). and the number of sets by h. But why organize IVs into sets? There are two quite different kinds of reasons for doing so, the first structural or forma! and the second functional to the substance or logic of the research.

133

134

4.

SET S OF INDEPENDENT V ARIABLES

4.1.1 Structural Sets W c use the term research factor to identify an influence operating on Y, or more generally an entity whose relationship to Y is under study. The word factor is used here in the sense of the A V “ factorial design,” not that of factor analysis. Thus, sources of Y variance like treatment group, age, IQ , geographic area, diagnostic group, fiscal year, sociocconomic status, kinship system, strength of stimulus, and birth order are all research factors. Note that the examples are quite general and of varied character. Some are inherent properties or characteristics of the research material, whereas others are the consequences of experimental ma­ nipulation. Some are quantitative whereas others arc qualitative or nominal. Although these are important distinctions, the concept of research factor spans them. Now. when a research factor can be presented by a single IV , no necessity for structural sets arises. This was taken to be the case for the research factors in Chapter 3. But such is not the general case. Three formal circumstances make it often necessary to represent a single research factor by a set of multiple IVs, so that its several aspects are represented: 1. Nom inal or qualitative scales. When observations arc classified by a research factor G as g mutually exclusive and exhaustive qualitative categories, G is defined as a nominal scale (Stevens, 1951) and can be understood as having £ - 1 aspects and therefore its complete representation requires a set made up of kc = g ~ 1 IV s. Some examples are experimental treatment, religion, diagnosis, ethnicity, kinship system, and geographic area. Thus, the research factor “ Re­ ligion,” a nominal scale made up of the g = 4 categories Protestant, Catholic, Jewish and Other, can«o( be represented by a single IV , but requires ka — g - I = 3 IV s to represent it. This can be understood as being due to the fact that any schcme which w ill fully represent G (i.e., the distinctions among g groups) must have exactly g — 1 aspects. Sim ilarly, in a laboratory experiment in which subjects arc randomly assigned to three different experimental groups and two different control groups (hence, g = 5), the research factor G of treatment group requires exactly g — I = 4 IV s to fully represent the aspects of G (that is, the distinctions among the 5 treatment groups). The several different methods for accomplishing this representation (the subject of Chapter 5), can be understood as different systems of aspect representation, but each requires a set of k(; = g — 1 IV s to fully represent G. For example, in one of these systems (Section 5.5, “ Contrast Coding” ), one of these aspects, represented by one IV , might be the distinction between the three experimental groups on the one hand and the two control groups on the other. But the full G information requires a set' to represent it, and does so for purely structural reasons, that is, the nominal scale form of the research factor. 'The limiting case here occurs when G is made up of only 2 ( - g) groups. for example, malefemale, schizophrenic-nonschizophrenic, experimental-control The G set would then contain only \ { - g - 1} IV, because there is only one aspect of G. the distinction between its two constituent groups. The reader may be familiar with the long-standing practice of representing such binary

4.1 INTRODUCTION

135

2. Quantitative scales. This term is used in this book to convey coliectiveiy scales of the kind called ordinal, interval, and ratio by Stevens (1951). For example, interval scales are those whose units of measurement arc treated as (more or less) equal, for example, scores o f psychological tests and rating scalcs or sociological indices. These are the conventional variables long used in M R C , and the idea that a quantitative research factor has more than one aspect may seem strange. However, when one wishes to take into account the possibility that such a research factor as age may be related nonlinear}}’ to Y (or other research factors) other aspccts o f age must be considered. Age as such represents only one aspect of the A research factor, its linear aspect. Other aspects, which provide for various kinds o f nonlinearity in the relationship o f A to

may be represented by

other IV s such as age-squared and age-cubed (see Section 6.2, “ Power Poly­ nomials” ). Thus, age, broadly conceived as a research factor A, may require a set of kA IV s to represent it for purely formal or structural reasons. Chapter 6 is devoted to several methods of representing aspccts o f quantitative research fac­ tors that require a set of IV s .2 3. M issing data.

It frequently occurs in research in the social and behavioral

sciences that some o f the data for a research factor are not available. For exam­ ple, in a market research survey, some respondents do not answer an item on family income. Thus, the research factor “ Income” has not only the obvious aspect of this information when it is given, but also the aspect of whether it is given or not, its “ missingness.” The latter is often relevant to Y (or other research factors) and can be represented as information. Details of this procedure are the subject o f Chapter 7. It is sufficient to point out here simply that the structure of such a research factor defines (at least) two aspccts, and thus its representation requires a set of IV s . The preceding implies that if we are determining the proportion of variance in a dependent variable Y due to a single research factor, we w ill (in general) he finding a squared multiple correlation, because the latter w ill require a set of two or more IV s . These are necessary when the research factor has multiple aspects due to any o f the three structural reasons just described. 4.1.2 Functional Sets Quite apart from structural considerations, IV s are grouped into sets for reasons of their substantive content and the function they play in the logic of the research. Thus, if you arc studying the relationship between the psychological variable field dependence (K ) and personality (P) and ability (/I) characteristics, P may

distinctions by aligning as scores zeros to one group anti ones to the other for purposes of correla­ tion, already illustrated tn Section 2.3.3. This is seen in Chapter 5 to be an instance of the dummyvariable coding method (Section 5.3). 2Here the limiting case is that in which the research factor A is represented only by age as such, that is, its linear aspect. This is, of course, the traditional procedure. Another set containing only one IV would result if the investigator was prepared to represent A by only its logarithmic aspect, that is, log age. (Sec Section 6.5, "Nonlinear Transformations " )

136

4.

SE T S OF IN D EPEN D EN T V A R IA BLES

contain a set of kp scales from a personality questionnaire and A a set o f kA subtests from an intelligence scale. The question o f the relative importance of personality and ability (as represented by these variables) in accounting for variance in Y would be assessed by determining

the squared multiple

correlation of Y with the kP IV s of P, and R \ A (ditto for the kA IV s o f A) and then comparing them. Sim ilarly, a sociological research that is investigating (among other things) the socioeconomic status (6') of school children might represent 5 by occupational index o f head o f household, family income, mother’s education, and father’s education, a substantive set of 4 ( = ks ) IVs. For simplicity, these illustrations have been of sets of single-IV research factors, but a functional set can be made up of research factors that are themselves sets, for example, a demographic set (D) made up of structural sets to represent ethnicity, marital status, and age. A group of sets is itself a set and requires no special treatment. It is often the nature of research that in order to determine the effect of some research factor(s) of interest (a set B), it is necessary to statistically control for (or partial out) the Y variance due to causally antecedent variables in the eases under study. A special case o f this is represented by the analysis o f covariance (A C V ; sec Chapter 10). A group of variables deemed antecedent either temporally or logically in terms of the purpose of the research could be treated as a functional set for the purpose o f partialling (covarying) out of / ’s total variance the portion of the variance due to these antecedent conditions. Thus, in a comparative evaluation o f compensatory early education programs (B ), with school achieve­ ment as Y, the set to be partiallcd might include such factors as family so­ cioeconomic status, cthnicity, number o f older siblings and preexperiinciUal reading readiness. This large and diverse group of IV s functions as a single covariate set A in the research described. In another research they might have different functions and be treated separately or in other combinations. It is worth noting here that the organization of IV s into sets of whatever kind bears on the interpretation o f M R C results, but has no effect on the basic computation. For any Y and k IV s (Y ,,

. . . , Xf.) in a given analysis, whatever the set makeup o f the IVs, . k and the array of partial statistics for cach X s U r (> p rt, (3,, B j and all relevant /■ and / values are determined as described in the previous chaptcr. W c shall soon see that sets of IV s may be progressively added in a hierarchical M R C analysis, but for the given group of IV s present at any stage of this analysis, the computation is as described pre­ viously. Before leaving the topic o f functional sets, an admonitory word is in order. Because it is possible to do so, the temptation exists to assure coverage of a theoretical construct by measuring it in many ways with the resulting large number o f IV s then constituted as a set. Such practice is to be strongly dis­ couraged, bccausc it tends to result in rcduced statistical power for the sets (see Section 4.5) and an increase in spuriously “ significant” single-IV results (see Section 4.6) and generally bespeaks muddy thinking. It is far better to sharply reduce the size o f such a set, and by almost any means. One way is through a tightened conceptualization of the construct, a priori. In other situations, the

4.2 SE T S: SIM U LT A N EO U S AND HIERARCHICAL A N A L Y S E S

137

large array o f measures is understood to cover only a few (or even one) behav­ ioral dimensions, in which case their reduction to scores on a few (or even one) factors by means o f factor or cluster analysis is likely to be most salutory for the investigation, with little risk of losing y-relevant information (see Section 4.6.2). Note that such analyses are performed completely independently o f the values of the

rYi correlations.

4 .2 S IM U L T A N E O U S A N D HIERARCHICAL ANALYSES OF SETS 4.2.1 T h e S im u ltaneou s Analysis of Sets W e saw in the last chapter that, given k IV s , we would regress Y on all of them simultaneously and obtain R Y.] 2 ..x as w e" as partial statistics for each X r N ow , these partial statistics are written in shorthand notation (i.e ., B t, sr,, p r ^ , but it is understood that all the IV s (other than X>) are being partialled. This immedi­ ately generalises to sets o f IV s: when sets U, V, W are simultaneously regressed on there are a total o f k a + kv + k w = k IV s that together determine ( and the partial statistics for each IV in the set U has all the remaining k — I IV s partialled: those from V and W (numbering k v + i ^ ) and also the remaining {ka

— 1) IV s from its own set. It w ill be shown that, for example, the adjusted Y means of the A C V are functions of the regression coefficients when a covariate set and a set (or sets) of research factors of interest are simultaneously regressed on K (Chapter 10). W e w ill also sec that sets as such may be partialled, so that

U-VW may be related to Y by means o f a partial or scmipartial R. 4.2.2 Hierarchical Analysts of S ets In the preceding chapter (Section 3,6.2), wc saw that the k IV s could be entered cumulatively in some specified hierarchy at each stage of which an R 2 is deter­ mined. The R 2 for all k variables could thus be analyzed into cumulative incre­ ments in the proportion o f Y variance due to the addition of each IV to those higher in the hierarchy. These increments in R2 were noted to be squared semi­ partial correlation coefficients, and the formula for the hierarchical procedure for single IV s was given as (3.6.1 )

/ fy .1 2

=ryi

+

+ ry-(3.I2 ) +

ry(4-123) + '

' ' + r Y ( k - 123...k - I )•

The hierarchical procedure is directly generalizable from single IV s to sets of IV s. Replacing k single IV s by h sets of IV s , wc can state that these h sets can be entered cumulatively in a specified hierarchical order, and upon the addition of each new set an R 2 is determined. The R2 for all h sets can thus be analyzed into increments in the proportion of Y variance due to the addition o f cach new set of IV s to those higher in the hierarchy. These increments in R 2 are, in fact, squared multiple semipartial correlation coefficients, and a general hierarchical equation for sets analogous to Eq. (3.6.1) may be written. To avoid awkwardness of notation, we write it for 4 ( = h) sets in alphabetical hierarchical order and use the full dot notation; its generalization to any number o f sets is intuitively obvious:

138

(4.2.1)

4.

SETS OF INDEPENDENT VARIABLES

R y - r u v w ~ R y - t + R \- ( u -T) + R y -(V‘TU) + ^

tuv)



W e defer a detailed discussion of the multiple scmipartial R 1 to the next section. Here it is sufficient to note merely that it is an increment to the propor­ tion of Y variance accounted for by a given set of IVs (of whatever nature) beyond what has already been accounted for by prior sets; that is, sets higher up in the hierarchy. Further, the amount of the increment in Y variance accounted for by that set cannot be influenced by Y variance associated with subsequent sets; that is, those which are lower in the hierarchy. Consider an investigation of length of hospital stay O') of n = 5(X) randomly selected psychiatric admissions to eight mental hospitals in a state system for a given period. Assume that data are gathered and organized to make up the following sets of IVs: 1. Set D — Demographic characteristics: age, sex, sociocconomic status, eth­ nicity. Note incidentally that this is a substantive set, itself made up of sets. Assume kD = 9. 2. Set M — Nine of the scales of the Minnesota Multiphasic Personality In­ ventory (M M P I). This set is also substantive, but note the necessity for making provision for missing data (a structural feature) and assume kM = 10. 3. Set H — Hospitals. The hospital to which each patient is admitted is a nominally sealed research factor. With 8 (= g) hospitals contributing data, we will require a (structural) set of kH = 7 (= g — I) IVs to represent fully the hospital group membership of the patients. Although there is a total of 26 (= kD + kM + k,t = k) IVs, our analysis may proceed in terms of the 3 (= h) sets hierarchically ordered in the assumed causal priority of accounting for variance in length of hospital stay as D, M, H Suppose that we find that R \ D = .20 (i.e ., the demographic set, made up of 9 IVs) accounts for 20% of the Y variance. Note that this ignores any association with M M P I scores (M ) or effects of hospital differences (H). When we add the IVs of the M M P I set, wc find that R$.nM = .22; hence, the increment due to M over and above D, or with D panialled, is R y . ^ . m — .02. Thus, an additional 2 % of the Y variance is accounted for by the M M P I beyond the demographic set. Finally, the additional of the 7 IVs for hospitals (set //) produces an Ry.,Wfj = .33, an increment over Ry.nM of .11, which equals R$ l(//.[)My Thus, we can say that which hospital patients enter accounts for 11% of the variance in length of stay, after we partial out (or statistically control, or adjust for, or hold constant) the effect of differences in patients’ demographic and M M P I charac­ teristics. W e have, in fact, performed by M R C an A C V for the research factor “ hospitals,” using sets D and M as covariates5 (see Chapter 10). (At the second stage of the analysis, we did the equivalent for the M M P I as the research factor, 'Omitting the significance test, a detail that will be attended to in Section 4.4 Also, a valid ACV requires that there he no interaction between I t and the aggregate M, D covariate set (see Chapters 8 and 10).

4.3 VARIANCE PROPORTIONS FOR SE T S

139

with the demographic characteristics as covariatcs, although, because D is not a quantitative scale, this is not within the usual purview of A C V ; see Scction 10.4.) There is much to be said about the hierarchical procedure and, indeed, it is said in the next section and throughout the book. For example, as pointed out in regard to single variables (Section 3.6.2), the increment due to a set may depend critically upon where it appears in the hierarchy; that is, what has been partialled from it, which, in turn, depends on the causal theory underlying the research. The chief point we wish to emphasize here is that hierarchical analysis can proceed quite generally with sets of any kind, structural or functional, and this includes sets made up of sets. 4.3. V A R IA N C E PR O PO R T IO N S FOR S E T S 4.3.1. The Ballantine Again W e know of no better way to present the structure of relationships of IVs to a dependent variable Y than the ballantine. It was presented in Fig. 3.3.1 for single IV s X , and X 2, and we present it as Fig. 4.3.1 here for sets A and B. It is changed in no essential regard, and we show how the relationships of sets of IV s to Y, expressed as proportions of Y variance, arc directly analogous to similarly ex­ pressed relationships of single IVs. A circle in a ballantine represents the total variance of a variable, and the overlap of two such circles represents shared variance or squared correlation. This seems reasonable enough for single variables, but what does it mean when wc attach the set designation A to such a circle? What does the variance of a set of multiple variables mean? Although cach of the kA variables has its own variance, remember that a multiple R y - i i . k is in fact a simple r2 between Y and Y, the latter optimally estimated from the regression equation for variables X ,,

X 2, . . . , X k (Eq. (3.3-3)j . Thus, what is presented by the circle for set A is the variance of a single variable, namely that of YA, the estimated Y from the regression equation of the kA IV s that make up set A [similarly for set B (i.e., f 0) or any other set of IV s ]. Thus, by treating a set in terms of how it bears on Y. we effectively reduce it to a single variable. This lies at the core of the generalizabiiity of the bailantinc from single IV s to sets of IVs. The ballantine in Fig. 4.3.1 presents the general case: A and B share variance with Y, but also with each other.4 This is, of course, the critical distinction between M R C and the standard orthogonal A V . In an.4 x B factorial design A V , 4l! can be proved (hat the correlation between A and B. where each is regression weighted to optimally estimate Y, is given by

< 4-31)

=

RY-A R--Y-B>

where i indexes an X, in set /I. j indexes an X, in set if, and the summation is taken over all i. j pairs (of which there are kAkK).

140

4.

S E T S OF IN D EPEN D EN T V A R IA BLES

RY-A -

a+ c

/?y,jj-b + c

RY-AB = a + b + c sRa =Ry-(A -B) r \-AB ~flV-fl

= a’

sR% - Ry-(B-A) =r y -a b ~ r Y-A „p> -

pR ,1

-

- / fJV l- Z J ~

- R y b *a ~

F IG U R E 4.3.1

Ry-AB-RY-B = I r y -a b

J

y .B r y -a

1 R \.a

= b-

a a +e

_

b

b +e

The ballansmc for sets A and B.

the requirement o f proportional eel] frequencies makes A and B (specifically y and Yn ) uncorrelated with each other; therefore the A and B circles do not overlap cach other, and each accounts for a separate and distinguishable (that is, addi­ tive) portion of the Y variance. (This is incidentally what makes the computation simpier in A V than in M R C .) Although not fundamental to the distinction be­ tween laboratory experiments and field research (which is randomization), it is nevertheless true that field research is characterized by the overlap o f research factors whereas experiments at least make possible their independence or n»«ovcrlap. The ballantine makes possible putting into dircct correspondence proportions of variance (i.e.. squared correlations of various kinds) to ratios o f areas o f the circle for Y, as we saw in Section 3.3. The total variance of Y is taken to equal unity (or 100%) and the Y circle is divided into four distinct areas identified by

4.3 VARIANCE PRO PO RTIO NS FOR S E T S

141

the letters a, b, c, and e. Because overlap represents shared variance or squared correlation, we can see immediately from Fig. 4.3.1 that set A overlaps Y in areas a and c; hence m

{Q.J.z)

c-2

h y -a

- YA over!aP a + c ~ ---- 72----=---- = a + c. sd\ i

The c area arises inevitably from the AB overlap, just as it did in the single-IV bailantine in Scction 3.3, and is conceptually identical with it. It designates the part of the Y circle jointly overlapped by ,4 and B, because (4.3.3)

* ^

=

^

^

sd\

E

= ! L ^ = b + c. 1

Because the c area is part of both A ’s and B ' s overlap with Y, for sets, as for single IV s , it is clear that (for the general case, where YA and are correlated) the proportion of Y variance accounted for by sets A a n d B together is not simply the sum of their separate contributions, because area c would then be counted twice, but rather (4.3.4)

R y -a b = - ~ y +- = a + b + c.

Thus, the areas a and b represent the proportions of Y variance uniquely accountcd for respectively by set A and set B. B y uniquely we mean relative to the other set, thus area b is Y variance not accounted for by set A, but only by set B; the reverse is true for area a. This idea of unique variance in Y for a set is of great importance in M R C and particularly so in the hierarchical procedure. It is directly analogous to the unique variance of a single IV discusscd in Chapter 3. There we saw that for X :. the unique variance in Y is the squared semipartial correlation of Y with X t, which in abbreviated notation wc called srj. It was shown literally to be the r2 between that part of X t that could not be estimated from the other IV s and all of Y, the complete cumbersome notation for which is r \ u . I2

(/)

the inner parentheses

signifying omission. F o ra se t# , we similarly define its unique variance in Y to be the squared multiple scmipartiai correlation of Y with that part of B that is not estimable from A, or literally that part o f Y[t that can not be estimated by YA . Its literal notation would be R\-^ yh y ^ or>somewhat more simply, or, even more sim ply, sR \. In the latter notation, Y is understood, as is the other set (or sets) being partialled. (Obviously, all the above holds when A and B arc interchanged.) 4.3.2 T he S em ip a rtial R 2 The ballantine is worth a thousand words. "T h a t part o f B which is not estimable from A ” is represented by the part o f the B circlc not overlapped by the A circle, that is, the combined area made up of b and f. That area overlaps with the

142

4.

S E T S OF IN D EPEN D EN T V A R IA BLES

(complete) Y circle only in area b, therefore the proportion of the total F variance accounted for uniquely by set B is (4.3.5)

sRf) - Ry.(n-A) = y = b.

and, by symmetry, the proportion of Y variance accounted for uniquely by set A is

(4.3.6)

SR \ =^ . (/1.c ) = j= a .

The ballantine shows how these quantities can be computed. If R y A!! is area a + b + c (Eq. 4,3.4) and Ry.A is area a + c (Eq . 4.3.2), then patently b = (a + b + c) - (a + c), (4.3.7)

sR h= R Y 2 .Ali

~ R 2Y .A ,

and, symmetrically, a = (a + b + c) -(b + c),

(4.3.8)

sR% -

R y .A n

- Ry-B-

The sR2 can readily be found by subtracting from the R 2 for both sets the R 2 for the set to be partialled. It is not necessary to provide for the ease of more than two sets o f IV s in the ballantine,5 or, indeed, in the preceding equations. Because the result of the aggregation o f any number o f sets is itself a set, these equations are self-gener­ alizing. Thus, if the unique variance in Y for set B among a group of sets is of interest, we can simply designate the sets other than B collectively as set A, and find sR% from Hq. (4.3.7). This principle is applied successively as each set is added in the hierarchical analysis, cach added set being designated B relative to the aggregate o f prior sets, designated A. W e shall see that this framework neatly accommodates both significance testing and power analysis. W c offer one more bit of notation, which, although not strictly necessary, w ill be found convenient later on in various applications o f hierarchical analysis. In the latter, the addition of a new set B (or single-IV Xf) results in an increase in R 2 (strictly, a nondecrcase). These increases are, of course, the sRfs (or srf), as already noted. It is a nuisance in presenting such statistics, particularly in tables, to always specify all the prior sets or single IV s that arc partialled. Because in hierarchical M R C the hierarchy of sets (or single IV s ) is explicit, we w ill have occasion to identify such sR% (or srf) as increments to Y variance at the stage of the hierarchy where B (or X,) enters and represent them as IB (or If).

5A fortunate circumstancc, because ihc complcic representation of three sets. would require a threedimensional ballantine and, generally, the representation of h sets, an /i-dimensional ballantine.

4.3 VARIANCE PRO PO RTIO NS FOR SE T S

143

4.3.3 The Partial ff2 W c have already identified the overlap o f that part o f a set circle that is unique (e .g ., areas b + f of set B) with the total Y circle as a squared multiple jew/partial correlation (e.g., sR^ = R \ . {!iA) = area b). W ith sets as with single IV s , it is a iem/partial because we have related the partialled B-A with all o f Y . W c wrote it as b/1 in Eq. (4.3.5) to make it explicit that wc were assessing the unique variance b as a proportion of the total Y variance of I. W c can however also relate the partiallcd B-A to the partialled Y A, that is, we can assess the unique b variance as a proportion not o f the total Y variance, but of that part o f the K variance not estimable by set A, actually Y — YA. The result is that we have defined the squared multiple partial correlation as (4.3.9)

pR% = R \ b .a = J L

,

b +e and symmetrically for set A as (4.3.10)

p R \ = R y2 a .b

a +e

Thus, sR2 and p R 2 (as sr2 and p r 2) differ in the base to which they relate the unique variance as a proportion: sR 2 takes as its base the total Y variance whereas p R 2 takes as its base that proportion o f the Y variance not accounted for by the other set(s). Inevitably, with its base smaller than (or at most equal to) 1. p R 2 w ill be larger than (or at least equal to) sR2 for any given set. It is easy enough to compute the p R 2. W e have seen how, for example, the b area is found [Eq. (4.3.7)]; the combined areas h + e constitute the Y variance not accounted for by set A, hence 1 — Ry.A. Substituting in Eq. (4.3.9), (4.3.11)

p

R2 =

b = b +e

r y -a b

- R

y -a

1 -R y-A

and, symmetrically, (4.3.12)

pR l

.

= H+ C

1 —R y ‘B

To illustrate the distinction between sR 2 and p R 2, we refer to the example of the hierarchy o f sets o f demographics (D ), M M P I (M), and hospitals (H) in relationship to length o f hospital stay (T) o f Section 4.2.2. R$_n — .20, and when M is added, R 2.DM = .22. The increment was .02; hence, sR\, - .02 (or l M — .02), that is, 2% of the total Y variance is uniquely (relative to D) accounted for by M . But if we ask “ what proportion of the variance of Y not accounted for by D is uniquely accounted for by M T ’ our base is not the total Y variance, but only I — Ry.D = 1 — .20 = .80 o f it, and the answer is pR% = .02/.80 = .025. Letting D = A and M — B, wc have simply substituted in Hqs. (4.3.7) and (4.3.11).

144

4.

SET S OF INDEPENDENT VARIABLES

It was also found that the further addition of H resulted in R \.DMH - .33. Thus, H accounted for an additional . 11 of the total Y variance, hence sRj, = . 11 (= IH). If we shift our base from total Y variance to Y variance not already accounted for by D and M, the relevant proportion is .11/(1 — .22)= .141 (i.e., pR},). Now letting sets D + M = A, and H = B, wc again have simply substituted in Bqs. (4.3.7) and (4.3.11). Any desired combination of sets can be effected; if wc wished to combinc M and H into set B, with D = A, we could determine that sR21H = .13, and p R h n = -13/(1 — .20) = .162, by the same equations. It is worth noting that the pR2 is rather in the spirit of A C V , In A C V , the variance due to the covariates is removed from Y, and the cffccts of research factors are assessed with regard to this adjusted (partialled) Y variance. Thus, in the latter example, D and M may be considered to be covariates whose function is to “ equate” the hospitals, so that they may be compared for length of stay, free of any possible hospital differences in the D and M of their patients. In that spirit, we are interested only in the 1 — Ry DM portion of the Y variance, and the pRjj takes as its base the .78 of the Y variance not associated with D and M\ hence, pRj, = .11/.78 = .141 of this adjusted (or partialled, or residual) vari­ ance is the quantity of interest. Partial correlations are sometimes called net correlations (as in net profit), becausc they represent the correlation that remains after the effcct of other variable(s) have been removed (debited) from both Y and the set (or single IV ) being correlated.

Semipartial R and Partial R The previous development and formulas have been cast in terms of squared correlations, that is, sR2 and p R 2. For these, as indeed for all correlations, squared values are distinctly lower than unsquared values; for example, a cor­ relation of .20 becomes, when squared, .04, In the behavioral and social scienccs, where relationships are not strong, correlations (of whatever kind) only infrequently arc as large as .50. Squared correlations arc then only infrequently as large as .25. In an effort to keep up morale (or hold his head high among his colleagues in the older sciences), it is understandable when a behavioral scientist prefers to report unsquared rather than squared correlations, for example, sR and pR rather than sR2 and pR2. W c feel tolerant toward this practice, indeed, even compassionate. Keep in mind, however, the advantage of working with propor­ tions of variance, particularly the additivity of sR2 in the hierarchical procedure. There is no inherent advantage in the larger values of sR and pR relative, respectively, to sR2 and pR1. nor of partial relative to scmipartial R 2 (or R), because all yield identical statistical significance test and power analysis results as we see later. The important considerations in selecting a descriptive statistic here, as always, are that it correctly represent the research issue and that its meaning be dear to the research consumer.

4.4 SIGNIFICANCE TESTING FOR S E T S

145

4.3.4 Area c Finally, returning once more to the ballantine Tor sets (Fig. 4.3.1). we call the reader's attention to area c. the double overlap of sets A and B in Y. It is conceptually the same as the area c in the bailantine for single IV s (Fig. 3.3.1) and shares its problems. Although in the ballantine it occupies an area, unlike the areas a, b. and e it cannot be understood to be a proportion of Y variance, because, unlike these other areas, it may take on a negative value as discusscd in Section 3.3 on p. 90. Note that it is never properly interpreted as a proportion of variance, whether or not in any given application it is found to be positive, because we cannot alter the fundamental conception o f what a statistic means as a function of its algebraic sign in a particular problem. Because variance is sd1. a negative quantity leads to sd being an imaginary number, for example. V - . 10. a circumstance we simply cannot tolerate. Better to let area c stand as a useful metaphor that reflects the fact that Ry.A/) is not equal in general to R YA + Ry.#, but may be either smaller (positive c) or larger (negative c) than the latter. When area c is negative for sets A and B, we have exactly the same relationship of suppression between the two sets as was described for pairs of single IV s in Section 3.4.

4.4. SIGNIFICANCE TESTING FOR SETS 4.4.1 A General f T e s t for an Increm ent (Model I Error) W c have seen that the addition of a set of variables B to a set A results in an increment in the Y variance accounted for by R 1 AR — Ry.A (= sRfc = IH). This is represented by the area b in the ballantine (Fig. 4.3.1). This quantity is properly called an increment because it is not mathematically possible for it to be nega­ tive, because R \ Ar cannot be smaller than Ry.A.(y Our interest, of course, is not in the characteristics of the sample for which these values are determined as such but rather in those of the population from which it comes. Our mechanism of statistical inference posits the null hypothesis to the effect that in the population , there is literally no increment in Y variance accounted for when B is added to A, that is, that R \.An — R^ A — sR g (= / « ) = 0, or that area b in the ballantine for the population is zero. When this null hypoth­ esis is rejected, we conclude that set B does account for Y variance beyond that accounted for by set ,4 in the population (i.e ., B A does, indeed, relate to Y). This null hypothesis may be tested by means of ( 4 .4 . 1 )

F -

W Y - A B ~ R r ' A ) l kB (1 - R y -a b W

~ k A ~ k B ~ 1)

T h is proposition does noi hold tor R 2 corrected for shrinkage, that is. R J. ^>- a may **negative. This will occur whenever the I- of Eq. (4.4.1) is less than one. Sec Scclton 6.2.4.

146

4.

SE T S OF IN D EPEN D EN T V A R IA 8LES

for the source (numerator) d f = kH, the error (denominator) d f = n — kA — kE — 1, and referred to the F tables in the appendices (Appendix Tables D. i and D.2). This formula is applied repeatedly in varying contexts throughout this book, and its structure is worth some comment. Both the numerator and denominator are proportions of / variance divided by their respective df; thus both arc “ nor­ malized”

mean squares. The numerator is the normalized mean square for

unique B variance (area b of the ballantine) and the denominator is the nor­ malized mean square for a particular estimate o f error (i.e., 1 -

which

represents Y variation accounted for by neither A nor B (area e of the ballantine). This is designated as Model 1 error. F is the ratio of these mean squares that, when the null hypothesis is true, has an expected value of about one. When F is sufficiently large to meet the significance criterion, as determined by reference to Appendix Tables D. I and D.2, the null hypothesis is rejected.7 For computational purposes Eq. (4.4.1) can be somewhat simplified: (4.4.2)

F =

x " “ - k ' kA Z ±



R y -a b

kti

(for, o f course, the same df).

Application in Hierarchical Analysis T o illustrate the application of this formula, let us return to the study presented in the previous sections on the length of stay of 500 hospital admissions, using demographic (D , kn = 9), M M P I (A/, kM = 10) and hospital ( H , k„ = 7) sets in that hierarchical order. W c let A be the set(s) to be partial!ed and B the set(s) whose unique variance in V is posited as zero by the null hypothesis. Table 4 .4 .1 organizes the ingredients o f the computation to facilitate the exposition. The nuil hypothesis that with D partialled (holding demographic characteristics constant) M accounts for no Y variance in the population is appraised as follows (Table 4.4.1. Example 1). It was given that R \.n = .20 and

~ .22, an

increase of .02. To use Eq. (4.4.2), call M set B and D set A . For n = 500, kn = 10, kA — 9, we find f =j 2 - :2 0 x S O O - 9 - l Q - l

1 - .22

=1

10

which for d f of 10 (= kB) and 480 (= n — kA ~ kB — I) fails to be significant at the a = .05 level (the criterion value for d f — 10, 400 is 1.85, Appendix Tabic D.2). The increase of .02 of the / variance accounted for by M over D in the sample is thus consistent with there being no increase in the population. In Example 2 of Table 4.4.1, we test the null hypothesis that the addition of H

7Readers who know A V will find this all familiar. But the reasoning and structure are not merely analogous but rather mathematically identical, bccau.se A V and M RC are applications of the “ gener­ al iincar model.'* This proposition is pursued in various contexts in this book (cf Sections 5 3, 5.5, and 6.3 and Chapters 8 to I I Also sec Section 4 4)

4.4 SIGNIFICANCE TESTING FOR SE T S

147

(which we will now call set B. so ku = 7) to the sets D and M (which together wc will call set A, so kA = 9 + 10 = !9) results in no increase in Y variance in the population. Because R$.nM„ = .33 and R \.nM - .22 (and hence sRjt = lr/ ~ .11), substituting the values for sets A and B as redefined we find

33 - 22 f =zz.—zz x 1 - .33

500 - 19 - 7 - I — - — —— 1— 1 = 11.094, 7

which for d f = 7, 473 is highly significant, because the criterion F at ex = .01 for d f = 7, 400 is 2.69 (Appendix Table D. I). It was pointed out in Section 4.3.3 that our appraisal of this . 11 increment in H over D and M constitutes the equivalent of an A C V with the 19 IV s of the combined D and M sets as covariates. Indeed, hierarchical M R C may be viewed as equivalent to a series of A C V , at each stage of which all prior sets are covariates (because they are partiallcd) whereas the set just entered is the re­ search factor whose effects are under scrutiny (see Chapter 10). The set just entered may itself be an aggregate of sets. Although it would not likely be of substantive interest in this research, Example 3 of Table 4.4.1 illustrates the F test for the aggregate of M and H (as set B) with D (as set A) paitialled. TABLE 4.4.1 Illustrative F Tests Using Model I Error (Eq. 4.4.2) f t V . 0 - .20

R y m h = ->8

R Y-M = 03

R y -DM3 22

R y . DH = .32

R2 Y .H = . n

r y -d m h

Set B

Example 1 2 3 4 5

M H M. H n

6 7 8 9 10 I! 12 13

M D D H M H n M 11

kB

Set A

D 10 7 D, M D 17 9 M. H 10 n , h 9 9 7 10 7 9 10 7

M H D H M empty empty empty

- -33

kA

R Y -A B

R y -a

9 19 9 17 16 10 7 9 7 10 0 0 0

.22

.20 .22

.33 .33 .33 .33 .22 .32 .32 .18 .18 .20 .03 .17

R \-(B -A ) - sR' b

Error 1

df

df

F

.17

.83

10 7 17 9 10 9 9 7 10 7 9 10 7

480 473 473 473 473 480 483 483 482 482 490 489 492

1.231 11.094** 5.399** 11.766** .706 12.991 **

.01 .15 .20 .03

,78 .67 .67 .67 .67 ,78 .68 .68 .82 .82 .80 .97

.02 .11 .13 .15 .01 .19 .15 .12

.20 .18 .32 .03 .17 .20 .17 .03 0 0 0

**/>< .01. p =RY'AH 1 ~ R Y -AD

x

Source Error

R\'A n

kA

kB_

kB

source (numerator) d f = kJt. error I (denominator) dj — n - kA - kl{ — !.

11.838** 12.176** .588 12.596** 13.611** 1.512 14 396**

148

4.

S E T S OF IN D E P E N D E N T V A R IA B L E S

Application in Sim ultaneous Analysis The F test of Eqs. (4.4.1) and (4.4.2) is also applicahle in simultaneous analysis. The latter simply means that, given h sets, we arc interested in apprais­ ing the variance of one of them with all the remaining h — 1 sets partialled. Whereas in the hierarchical model only highcr-order (prior) sets are partialied, in the ahsence of a concept of hierarchy it is all other sets that arc partiallcd. For this application of the F test we designate B as the unique source of variance under scrutiny and aggregate the remaining sets that are to be partialied into set

A. Let us reconsider the running example of length of stay O') as a function of D. M, and H, but now propose that our interest is one of appraising the unique Y variance accounted for by each set. No hierarchy is intended, so hy “ unique” to a set wc mean relative to all (here, both) other sets (i.e., D relative to M and H,

M relative to D and H, and H relative to D and M). To proceed, we need some additional R 2 values not previously given in this problem: R \. Mn = 18 and Ry i>n ~ -32. To determine the unique contribution of D relative to M and// (i.e., of D-MH) one simply finds R \. DMH - R 2 Y !AH = .33 - . 18 - . 15 = R \. {nMfir the sRj, with both M and H partialled. This quantity might be of focal interest to a sociologist in that it represents the proportion of variance in length of stay of patients associated with differences in their demographic (D) characteristics, the latter freed of any personality differences (M) and differences in admitting hospi­ tals (H) associated with D. This . 15 is a sample quantity, and Example 4 (Table 4 .4 .1) treats D as set B and aggregates M and H as set A for substitution in Fq. (4.4.2): J , = :33r . j 8 x 5 00- 17 1 - .33

-9

- l = li

9

which is highly significant because the criterion F at a = .01 for elf = 9, 400 is 2.45 (Appendix Tabic D .l). Note, incidentally, that this example simply re­ verses the roles of D and M, H of Example 3. The unique variance contribution of M relative to D and H is tested without further elaboration as Example 5. This might be of particular interest to a clinical psychologist or personality measurement specialist interested in controlling de­ mographic variables and systematic differences between hospitals in assessing the relationship of the M M P I to length of stay. The last of this series, the Y variance associated with H-DM, has already been presented and diseusscd as Example 2.

Interm ediate Analyses There is clearly no necessary requirement, given h sets, that all h — 1 sets be partialled from each in turn. B y defining set B as whatever set(s) is under scrutiny, we can designate as set A any of the remaining sets up to h — 1 of them for parlialling and apply the F test using Model I error given as Eqs. (4.4.1) and

4.4 SIGNIFICANCE TESTING FOR SETS

149

(4.4.2). The investigator’s choice of what to partial from what is determined by the logic and purpose of his inquiry. For specificity, assume that the h sets are partitioned into three groups of sets, as follows: the groups whose unique source is under scrutiny is, as before, designated setfl, the group to be partialled from/? (again as before) constitutes set A, but now the remaining set(s) constitute a group to be ignored, which we designate set C. All we are doing with this scheme is making explicit the obvious fact that not all sets of IVs on which there are data in an investigation need to be active participants in each phase of the analysis. There is no set C in the Model I Eqs. (4,4.5) and (4.4.2), which makes it evident that it is here being completely ignored.8 The running example has only three sets so our groups, including C, will each contain a single set, which is enough for illustration. There are a total of six different ways of designating three sets as A, B, and C to test the null hypothesis that B-A ignoring C accounts for no V variance in the population. One of these, M D (ignoring H) was given as Example I in Table 4 .4 .1. To determine and F test the remaining five, wc require two values not previously given, R Y.M = .03 and Ry u — .17. The five tests are given as Examples 6 through 10 in the table. For example, Example 7 gives for D-H (ignoring M) 500 - 7 - 9 -1 = 1 - .32

9

which is highly significant (F at .01 for d f — 9, 400 is 2.46; Appendix Table D .l). In the (fully) simultaneous analysis, all sets other than the one whose (fully) unique Y variance is under scrutiny are partialled. W e have now just seen that wc can ignore some sets in some phases of an inquiry. Indeed, the (fully) hierarchi­ cal analysis with h sets is simply a predefined sequence of simultaneous analyses in the first of which a prespecified h — 1 sets are ignored, in the second a prespecified h — 2 sets are ignored, and, generally, in the yth of which a prcspecified h - j sets are ignored until finally, in the last of which, none is ignored. The analysis at each stage is simultaneous—-all IVs in the equation at that stage are being partiallcd from each other. Thus, a single simultaneous analysis with a!! other sets partialled and a strictly hierarchical progression of analyses may be viewed as endpoints of a continuum of analytic possibilities. A flexible application of M R C permits the selection of some intermediate pos­ sibilities when they are dictated by the causal theory and logic of the given research investigation.

PartiaHing an Empty Set The generality of Eqs. (4.4.1) and (4.4.2) can be seen when they are applied in circumstances when the set A is empty (i.e., a null class containing no IVs).

“This elaboration into a set C which is ignored is also setting the scene for a consideration in Scchon 4.4.2 of Mode) II error, in which C is also removed from (he enor term (i.e.. 1 -

150

4.

SET S OF INDEPENDENT V A RIA BLES

Initialling no variables is simply not partialling. With A empty, R\.A = 0. R$.AH is simply R $ H, and kA — 0. Substituting for this special case, Hq. (4.4.2) bcconics (4.4.3)

x ” '*0 "',

1- R

\.„

kb

with df = kt,, n — kn — 1. This is patently the F test of the null hypothesis for any R2 and, except for a difference in notation, is identical with the equation given for that purpose in Chapter 3 (Hq. 3.6.1). Incidentally, a further specializa­ tion to a single IV in set B, renders k„ = I, R$./t = r \ v and becomes the equivalent of the standard significance test for a zero-order r, Eq. (2.8.1), because

(44.4)

=

F -J h \ - r Y\

1

= I ~ r Yl

and F for df = 1, n — 2 is identically distributed as l2 for d f = n — 2. To apply Eq. (4.4.2) specialized as Eq. (4.4.3) or Hq. (3.7.3) to our running example, we need Ry.D = .20, R2.M = .03, a n d ^ 7/ = .17. Examples I I , 12, and 13 give the ingredients and results of testing the null hypothesis using Model 1 error, respectively, for I), M, and H. Table 4.4.1 contains, for each set, the Y variance it accounts for and the Model I F test of its statistical significance when no other sets are partialled, when each of the other sets is partialled. and when both arc partialled. Taking H, for example, we have: H, A 2,H -M , .I5;and H-DM, .11— all significant at P < .01. Such an accounting may also be organized for D and M. Although this type of exhaustive analysis in which all possible combinations of the remain­ ing h — I sets (including none) are partialled from each set may sometimes be profitable, we do not recommend it in general. With h sets, it produces a total of h2h~ ' distinct sR2 values, which is a formidable number, even fora lew sets: 3 sets yield 12 values in an exhaustive analysis, while 5 sets yield 80, 6 sets 192, and 7 sets 448! Not only do large numbers of significance tests increase the probability of spuriously significant findings and thus subvert the credibility of the results of a statistical analysis (see Section 4.6), but it is likely that many (probably most) of the possible partials do not make substantive sense. This will generally be the case when a causally consequent set is partialled from an antecedent set. The strict ordering of h sets as in the hierarchical model selects only h of these h 2 h ' 1 possibilities, a most desirable procedure when possible. Short of this, careful selection from these many possibilities produces the rela­ tively small number of pertinent hypotheses of a convincing investigation.

Significance of pFF In Chapter 3 we saw that the partialled statistics of a single IV, X,- (i.e., sr(, prt, B,, and 0() all shared equivalent null hypotheses and hence the same i test for the same df = n — k — 1. Conceptually, this can be explained as due to the fact that

4.4 SIGNIFICANCE T ESTIN G FOR S E T S

151

when any one o f these coefficients equals zero, they all must necessarily equal zero. For any set B, the same identity in significance tests holds for sR^ and pR% (hence for sRn and p R ls). Recall that these are both unique proportions of variance, the first to the base unity and the second to the base 1 — Rj, A . In terms of areas of the ballantine for sets (Fig . 4.3.1), sRj, = b, and pRj, = b/(b + e). But the null hypothesis posits that area b is zero, hence pRj, = 0. Whether one reports sRjf as was done in Table 4.4.1, or divides it by 1 — R\-.A and reports pRj,, or reports both, because the null hypothesis is the same, the F test of Eq. (4.4.2) tests statistical significance o f both sRB and pR%. 4.4.2

An A lte rn a tiv e F T est (M o del II Error)

An F test is a ratio of two mean squares (or variance estimates), the numerator associated with a source of Y variance being tested, and the denominator provid­ ing a reference amount in the form o f an estimate o f error or residual variance. In the previous section, identifying A and ,6 as sets or set aggregates, the numerator source was B-A, and the denominator contained 1 — Ry.AB [area e of the ballan­ tine (Fig. 4.3.1)] thus treating all Y variance not accounted for by A and B as error in the F test of Eqs. (4.4.1) and (4.4.2). W e later introduced the idea of a third set (or set-aggregate) C. whose modest purpose was “ to be ignored.” Not only was it ignored in that it was not partialled from B in defining B A as the sourcc for the numerator, but it was ignored in that whatever Y variance it might uniquely contribute was not included in Ry.AH and therefore was part of the error, 1 “

R y as-

These two ways of ignoring C are conceptually quite distinct and may be considered independently. W e obviously have the option o f not partialling what­ ever we do not wish to partial from B so that the source o f variance in the numerator is precisely what the theory and logic of the investigation dictates it to be (i.e ., B-A and not 5\AC). W e may either choosc or not choose to ignore C in defining the error term (i.e ., cither use 1 — R \.AB in the F test o f Kqs. (4.4.1) and (4.4.2) and thus ignore C ) or define an F ratio for B-A that removes whatever additional unique Y variance can be accounted for by C from the error term, resulting in Model II error, that is, (4.4.5)

F=

(R

-a

y

(1 - R

y

~ R

b

-a

b c

y

-a ) I

) / (n - ^ -

1)

where k is the total number o f IV s in all sets, that is, k = kA + kB + k( or, equivalently, (4.4.6)

F =

X n ~ k ~ l 1

R

y

-a

b c

kB

with source (numerator) d f = kB, and error II (denominator) d f = n — k — 1. Note that, as with Model I error, this tests hoth sR}{ and pR B- The standard F

152

A.

SE T S OF IN D EPEN D EN T V A R IA BLES

tables (Appendix Tables D . 1 and D .2 ) are used. This is a generalization to sets of the error employed in the hierarchical analysis of the academic salary example (Section 3.8.1). W hich model to choose? It is the prevailing view that, because the removal of additional y variance associated uniquely with C can only serve to produce a smaller and “ purer” error term, one should always prefer Model II error. But this is not necessarily the case: although 1 — Ry.Agc w ill always be smaller (strictly, not larger) than 1 — P y AB and hence operate so as to increase F, one must pay the price o f the reduction of the error d f by kc , that is, from /i - kA kB — 1 of Eq. (4.4.2) to n — k — 1 = n — kA — kg — k( — I of Hq. (4.4.6), which clearly operates to decrease F. In addition, as error d f diminish, the criterion F ratio for significance increases and sample estimates become less stable, seriously so when the diminished error d f are absolutely small (see A p ­ pendix Tables D. 1 and D .2 ). The competing factors of reducing proportion of error variance and reducing error df, depending on their magnitudes, may either increase or decrease the F using Model II error relative to the F using Model I error (see Section 4.6.3). W c can illustrate both possibilities with the running example (Table 4.4.1), comparing Model I F ( E q . 4.4.2) with Model II F ( E q . 4.4.6). If, in testing M-D in Example 1, instead of using Model I error, 1 — R \ . DM = .78 with 480 (= 500 - 9 - 10 - I )df, we use Model II error, 1 - R \.DMH — .67 with 473 (= 500 9 — 10 — 7 — 1) df, F increases to 1.412 from 1.231 (neither significant). On the other hand, shifting to M odel II error in testing D -H in Example 7, brings /•' down from 11.838 to 11.766 (both significant at P < .01). In those instances in Table 4.4.1 where set C is not empty, and hence where Models I and II are not identical (Examples 1,6 through 13), the F ratios of the two models differ little and nowhere lead to different decisions about the null hypothesis. (The reader may check these as an exercise.) But before one jumps to the conclusion that the choice makes little or no difference in general, certain characteristics o f this example should be noted and discusscd, particularly the relatively large n, the fact that there are only three sets, and that two of these (D and H) account uniquely for relatively large proportions of variance. If n were much smaller, the difference o f kc loss in error d f in Model II could substantially reduce the size and significance of F, particularly in the case where we let M be set C: the addition of M to D and H results in only a quite small decrease in error variance, specifically from 1 - P \ . l m = .68 to 1 —

= .67. I f n were

100, the drop in error d f from Model I to Model II would be from 83 to 73. Example 7, which tests D-H would yield a significant Model I F = 2.034 (d f = 9, 83, P < .05), but a nonsignificant Model II F = 1.816 (df — 9, 73). Further, consider the consequence o f Model II error when the number of sets, and therefore the number o f sets in C and (particularly) kc is large. M any behavioral science investigations, particularly (but by no means solely) of the survey kind, can easily involve upward of a dozen sets with C containing as

4.4 SIGNIFICANCE TESTIN G FOR S E T S

153

many as 10 sets collectively including many IV s; hence, kc can be quite large.y The optimal strategy in such circumstanccs is to order the sets from those judged a priori to contribute most to those judged to contribute least to accounting for Y variance {or most to least confidently judged to account for any Y variance), and use M odel I error. Using the latter successively at each level of the hierarchy, the lowcr-ordcr sets are ignored and, although their (likely small) contribution to reducing the proportion o f error variance is given up, their large contribution to error d f is retained. (Section 4.6.3 enlarges on this issue.) Although Model II error may be used with the hierarchical model and has the small advantage of using at all stages the same denominator in Hq. (4.4.5) and a minimum I — R1 error proportion, it has the disadvantage of using a minimum d f = n — k — I, that is, all k IV s in the analysis are debited from n, not only the kA + ka involved in that stage. Thus, the significance tests o f higher-ordcr sets where the investi­ gator has “ placed his bets” suffer, perhaps fatally, with those o f the Iower-order sets in their use of a weakly determined, low d f, error term. On the other hand, if sets are few and powerful in accounting uniquely for Y variance, M odel I error, 1 — R \.AH, contains important sources o f variance due to the ignored C, and may well sharply negatively bias (reduce) F at a relatively small gain in error df. The error model o f traditional A V generally presented in textbooks for the behavioral and social sciences is Model II error, on the ra­ tionale that any potential source of variance provided for by research factors and their interactions should be excluded from the error term .10 The within-groups or within-cells error term o f A V is a special case of Model II error, because the group or cell membership in multifactor designs is carried by all k IVs. No simple advice can be offered on the choice between error models in hier­ archical or intermediate analysis of M R C . 11 In general, large n, small h, small k, and sets whose sR2 are large move us toward a preference for Model II error. One is understandably uneasy with the prospect o f not removing from the Model I error the variability due to a set suspected a priori o f having a large sR2 (e.g., Examples 9 through 13 in Table 4.4.1). The overriding consideration in the decision is the maximization o f the power of the F test. In the next section we see ^This is, not as extravagant as it may seem at first glance. For example, in a study using y research factors and their interactions (sec Chapter 8), there are a total o f h — 2* — I sets, i'O '.if factors thus produce 15 sets, and 5 factors produce 31. Hvcn with few IVs per research factor, the number of iVs mount up and error d f accordingly dwindle rapidly. See Section 4.6.2, which recommends that h and k be kept as small as possible. !(vThis is not the case in A V presentations geared to other fields, such as agronomy and engineer­ ing. In these it is common practice to "p o o l" into the error term the sums of squares and d f from some sources that arc not of interest (usually interactions of high order) with or without tests of significance, treating them as negligible. This is identical with the use of what wc arc calling Model /

error. ! 'Note that (he issue can not arise in fully simultaneous M RC , because sets A and 0 of fiqs, (4.4.1) and (4.4.2) are exhaustive of all /; sets and the set C of ILqs. (4.4.5) and (4.4.6) is empty; thus, the two pairs of equations rcduce to equality, and the two models arc indistinguishable.

154

4.

S E T S OF IN D EPEN D EN T V A R IA BLES

how one goes about making this decision during the planning o f an investigation, which ideally is when they should be made. At several points later in this book (particularly in Section 4.6.3 and Chapters 6 and 8), the issue o f choice between Model 1 and M odel 11 error comes up again, and its discussion in specific contexts w ill provide additional guidance.

4.5 POWER AN A LYSIS FOR SETS 4.5.1 Introduction In previous chapters we have presented the basic notions of power analysis (Sections 2.9 and 3.8), and methods for determining: 1. The power of the t test o f significance for r YX or B YX, given the sample size ( n ), the significance criterion (a ), and effect size (tiS ) [i.e., an alternate hypo­ thetical value o f the population r (Section 2.9.2)]. 2. The necessary sample size (n*) for the significance test of a partial coeffi­ cient of a single X ; in an M R C involving k lV s [that is, its net contribution to R 2 (whether indexed as sr}, pr}, B h or p ;), given the desired power, a , and the E S , an alternate hypothetical value for the population srf (Section 3.8.3)]. 3. n* for the significance test of R 2, given the desired power, a . and an alternate hypothetical value for the population R 2 (Section 3.8.2). The present section generalizes the previous methods of power analysis to parlialled sets, paralleling the significance tests of the preceding Section 4.4. Assume that an investigation is being planned in which at some point the propor­ tion of Y variance accounted for by a set B, over and above that accounted for by a set A, w ill be determined. W c have seen that this critically important quantity is R y -ab ~ Ry-A ar*d ^as variously and equivalently been identified as the increment due to B (/;J), the squared multiple semipartial correlation for B (sR2 n or Ry in A)) and as area b in the ballantine for sets (Fig. 4.3.1). This sample quantity w ill then be tested for significance, that is, the status o f the null hypothesis that its analogous value in the population is zero w ill he determined by means of an F test. In planning the investigation, it is highly desirable (if not absolutely neces­ sary) to determine how large a sample is necessary (n*) to achieve some desired level of probability o f rejecting that null hypothesis. A s we have seen, this w ill depend not only on ot (usually set at .01 or .05) but also on the actual size o f the effect in the population ( E S ) that must be posited as an alternate to the null hypothesis. W hatever the difficulties o f setting a priori specifications for these parameters (particularly tiS ), there is no alternative to rational research plan­ ning— in their absence, one does not know whether one needs to study 10 eases or 10,000 cases in order to achieve the purposes o f the research.

4.5 PO W ER A N A LY S IS FOR S E T S

155

4.5.2 D eterm ining n * fo r an F Test of s fl£ w ith M o del I Error As was the case for determining n* for an F test on flp. u

* (Section 3.8.2), the

procedure for determining n* for an F test on sR}} = R \ . AB — Ry A proceeds with the following steps: 1. Set the significance criterion to be used, a. 2. Set desired power for the F test. 3. Determine the number of source df, i.e., the number o f IV s in set B, kB. 4. Look up the value o f L for the given kB (row ) and power (column) in Appendix Tables E . l (for a -

.01) or E.2 (for a = .05). [Cohen (1977) pro­

vides a table for a = .10.) Power analysis o f significance tests on the F distribution uses as its alternatehypothetical E S in d e x / 2, which is. in general, a ratio o f variances12 (Cohen, 1977. pp. 412/). The particular test determines which variances are relevant and therefore defines the appropriate formula fo r/ 2. For determining t>* for an R 2, f 2 was given as R 2i{ 1 — R 2) (Eq . 3.8.1). In determining n * to test Ry.AB — Ry.A using Model I error, = .

° L . =^ = 1 - .25 .75

o4

from Eq. (4.5.3) (which, of course, cannot be smaller than the/2 for Model I, which was .03659). Because u (= .05), desired power (= .90), and kM = ktj ( — 10) remain as originally specified, L remains 20.53 (Appendix Table E.2). The total number of IV s , k = 9 + 10 + 7 = 26, so from Eq. (4.5.4) we find

n* =

+ 26 + 1 = 540 .04

4.5 P O W E R A N A L Y S IS FOR S E T S

159

for Model II, compared with 581 for Model I. For changed specifications of u and power the values o f L changc, with results for Model II n* as follows (with Model I /?* in parenthesis for comparison): a= .05, power

=.80; L

= 16.24 ;n* = 433 (464),

a = .01, power

=.80; L

= 22.18; n* = 582 (626),

a = .01, power

=.90;/,

= 26.98; « * = 702 (757).

The second illustrative example was the h* for the test for Rj- DH - R \ . D, which was posited to equal .05. Again, Model II error is 1 — Ry.DM/f = 1 — -25 = .75. Therefore, for Model II Eq. (4.5.3), f 2 = .05/.75 = .06667 (compared to Model I p

= .0625). L for a = .01, power = .80, at kl} {= kn ) = 7 was

found from Appendix Table E . 1 to be 19.79. Substituting in Kq. (4.5.4) with k ~

26, we find 19 79

u* = i y ' ,y- + 26 + 1 = 324 .06667 for Model II (compared to the Model I n * = 334). For the other specifications and the resulting change in L, the n* values for Model II are as follows (with Model I h* in parentheses):

a = .01, power = .90; L = 24.24; n* = 391 or = .05, power = .80; L = 14,35; >i* = 242 a = ,05, power = .90;/, = ! 8.28; n* = 301

(405), (247), (309).

The third illustrative example used previously was for a test of R \.DM„ — R \ DAi. Because the numerator contains all the sets that figure in the error term, Model II error is here the same as Model I error. Viewed as Model I, H is set B and the aggregate D + M is set A, so the error term is 1 — R \ AH = 1 — Ry d m h Viewed as Model II, because no set is ignored, C is empty and the error term 1 ahc = 1 — fi-Y AH £he same). Both for power analysis and significance testing, for tests where no sets are ignored in either numerator or denominator,

R\

Model II reduccs to Model I. In the two examples where the error models differed, wc found that n* was smaller for Model II than for Model I for the same specifications. These two instances should not be ovcrgeneralizcd, as we have already argued. The relative size o f the n* o f the two models depends on how much reduction in the (alter­ nate-hypothetical) proportion o f error variance occurs relative to the cost of d f due to the addition o f the kc IV s of set C. The Model II error denominator of p can be written as a function of the Model I error and the additional reduction due to Y variance unique to set C: (4.5.5)

1 - ^ =

1 - * ^ - ^ ,

so that

Model II error = Model I error -

s R ^ .

160

4.

S E T S OF IN D EPEN D EN T V A R IA BLES

The change in d f is kc . Therefore, Model II w ill require smaller n : than Model I when sR}: is large relative to k c , but larger n* than Model I when sR l is small relative to kc . Concretely, in the test of H-D of the second example, Ry.0 was posited to be .15, R 2,DH = .20 (hence, R \.nH — Ry.n = .05). and R \ . OMU = .25. The additional reduction of error by Model II over Model I is sR ^ = .05 ( = sRi-), and kM = 10 (= kc ). I f instead, we had posited Ry.o w , to be .22, sR 2f would be .02 (= sRfr) . The change results in a reduction in Model II relative to Model I error by .02 (instead o f .05); hence. Model II error by Eq. (4.5.6) is .78 (instead o f .75); f 2 would then be .05/.78 = .06410 instead of .05/.75 = .06667. The result for L — 19.79 (for kB = 7, a = .01, power = .80) is that the Model II test of H-D for the revised R \ . l}MU yields [Eq. (4.5.4) J 19 79 „ * = I Z i I Z . + 26 + I = 336, .06410 which is larger than the Model I fEq. (4.5.2)]

n*=

19 79

-+9 + 7+ 1 = 314. .06667

The basis for the advice given in Section 4.4.2 should now be clear. When there arc many IV s in set C (which may be an aggregate o f several sets) and there is no expectation that sR^~ is substantial, Model 1 may require smaller n* (or for the same n have greater power) than Model II. When kc is small and/or sRl* is substantial, the reverse holds. The methods of this section, applied to real­ istically posited values for

R y .AB, and Ry.ABC in the planning of an investi­

gation, w ill provide a basis for the a priori choice of error model. 4.5.4 S ettin g f 2 The key decision required in the power analysis necessary for research planning in M R C , and generally the most difficult, is producing an f 1 for substitution in the equation for n*. One obviously can now know the various population R 2 values that make up/2 . Unless some estimates are made in advance, there is no rational basis for planning. Furthermore, unless the f 1 bears some reasonable rcsemblancc to the true state o f affairs in the population, sample sizes w ill be too large or (more often) too small or, when sample sizes are not under the control of the researcher, the power o f the research w ill be under- or (more often) over­ estimated. The best w ay to procced is to muster all one's resources of empirical knowl­ edge, both hard and soft, about the substantive field of study and apply them, together with some insight into how magnitudes of phenomenon are translated into proportions of variance, in order to make the estimates of the population R 2 values that are the ingredients o f / 2. Some guidance may be obtained from a handbook o f power analysis (Cohen, 1977), which proposes operational defini­ tions or conventions that link qualitative adjcctives to amounts o f correlation broadly appropriate to the behavioral sciences. Translated into proportion of

A.5 POWER ANALYSIS FOR SE T S

161

variance terms ( r 2 or sr2}, these arc “ small,” .01; “ medium,” .09; and “ large,” .25. The rationale for these quantities and cautions about their use are given by Cohen (1962, 1977). Because/2 is made up of two (Model I) or three (Model II) different R 2 values, it may facilitate the thinking of a research planner to have operational definitions for/2 itself. With some hesitation, we offer the following: “ small,” / 2 = .02; “ medium,” / 2 = .15; and “ large,” p = .35. Our hesitation arises from the following considerations. First, there is the general consideration of obvious diversity of the areas of study covered by the rubric “ behavioral and social sciences.” For example, what is large for a personality psychologist may well be small for a sociologist. The conventional values offered can only strike a rough average. Secondly, bccausc/2 is made up of two or three distinct quantities, their confection into a single quantity offers opportunities for judgment to go astray. Thus, what might be thought of as a medium-sized expected sR2 (numerator) may well result in a large or quite modest/2, depending on whether the expected 1 - R 2 error (denominator) is small or large. Furthermore, an/2 = .15 may be appropriately thought of as a “ medium” ES in the context of 5 or lO IV s in a set, but seems too small when k — 15 or more, indicating that, on the average, these variables account for, at most, (.15/15 =) .01 of the Y variance. Nevertheless, conventions have their uses, and the ones modestly offered here should serve to give the reader some sense of the/2 quantity to attach to his verbal formulations, particularly when he cannot cope with estimating the population values them­ selves. The latter is, as we have said, the preferred route to setting/2. For further discussion o f / 2, see Cohen (1977, Chapters 8 and 9). 4.5.5 Setting Power for n * In the form of power analysis discussed thus far, we find the necessary sample size n* for a given desired power (given also a a n d /2). What power do we desire? If we follow our natural inclinations and set it quite large (say at .95 or .99), wc quickly discover that except for very large/2, n* gets to be very large, often beyond our resources. (For example, in the first example of Scction 4.5.2, the test of M-D, for a = .05 and power = .99, n* works out to be 905, about double what is required at power = .80.) If we set power at a low value (say at .50 or .60), n* is relatively small (for the example, at power = .50, n* = 271), but we are not likely to be content to have only a 50-50 chancc of rejecting the null hypothesis. The decision as to what power to set is a complex one. It depends upon the result of weighing the costs of failing to reject the null hypothesis (Type II error in statistical inference) against the costs of gathering and processing research data. The latter are usually not hard to estimate objectively, whereas the former include the costs of such imponderables as “ failing to advance knowledge,” “ losing face,” and editorial rcjcctions, and of such painful ponderables as not getting continued research support from funding agencics. This weighing of costs is obviously unique to cach investigation or even to each null hypothesis to

162

4.

SETS OF INDEPENDENT VARIABLES

be tested. This having been carefully done, the investigator can then formulate the power value he desires. Although there will be cxccptions in special circumstances, he is likely to choosc some value in the .70-.90 range. He may choose a value in the lower part of this range when the dollar cost per case is large and/or when the more intangible cost of a Type II error in inference is not great, i.e., when rejecting the null hypothesis in question is of relatively small importance. Conversely, a value at or near the upper end of this range would be chosen when the additional cost of collecting and processing eases is not large and/or when the hypothesis is an important one. It has been proposed, in the absence of some preference to the contrary, that power be set at .80 (Cohen, 1965, 1977). This value falls in the middle of the .70-.90 range and is a reasonable one to use as a convention when such is needed. 4.5.6 Reconciling Different n*s When more than one hypothesis is to be tested in a given investigation, the application of the methods described above will result in multiple « * s. Bccausc a single investigation will have a single n, these different n *s will require reconciliation. For concreteness, assume plans to test three null hypotheses ( H y) whose speci­ fications have resulted in h:[ = 100, n% = 300, and = 400. If we decided to use n = 400 in the study, wc will meet the specifications of //-, and have much more power than specified for//, and more fo r H2. This is fine if, in assessing our resources and weighing them against the importance of we deem it worthwhile. Alternatively, if we procccd with n = 100 we will meet the specifi­ cations of //, but fall short of the power desired for H2 and H3. Finally, if wc strike an average of these «*s and proceed with n = 267, we shall have more power than specified for//,, slightly less for H 2, and much less for //,. There is of course no way to have a single n that wiil simultaneously meet the rt* specifications of multiple hypotheses. No problem arises when resources are sufficient to proeced with the largest n * ; obviously there is no harm in exceeding the desired power for the other hypotheses. But such is not the usuai case, and difficult choices may be posed for the investigator. Some help is afforded if one can determine exactly how much power drops from the desired value when n is to be less than n* for some given hypothesis. Stated more generally, it is useful to be able to estimate the power of a test given some specified n, the inverse of the problem of determining « given some specified desired power. The next section is devoted to the solution of this problem. 4.5.7 Power as a Function of n Thus far, we have been pursuing that particular form of statistical power analysis wherein is determined for a specified desired power value (for given a and f 2}. Although this is prohably the most frequently useful form of power analysis,

4.5 PO W ER A N A LY S IS FOR S E T S

163

wc have just seen the utility of inverting n and power, that is, determining the power that would result for some specified n (for given a and / 2). The latter is not only useful in the reconciliation of different n f (Scction 4.5.6) but in other circumstances, for example, when the n available for study is fixed or when a power analysis is done on a hypothesis post hoc as in a power survey (Cohen, 1962). T o find power as a function o f n, we rewrite Hq. (4.5.2) for Model I and Hq. (4.5.4) for Model II to yield, respectively,

(4.5.6)

L* = f 2(n ~ k A - k B - 1),

(4.5.7)

L* = f 2( n - k - I).

Recall that L is itself a function o f kR, a , and power. T o find power, one simply uses the L tables (Appendix Tables E . i and E.2 ) backwards. Enter the table for the significance criterion a to be used in the row io r k n , and read across to find where the obtained L* falls. Then read off at the column heading the power values which bracket it.13 To illustrate: In Section 4.5.2, we considered a test of Ry.DM ~ R y o using Model I error at « = .05, where - kB — 10, ku = kA = 9, and/2 ~ .03659. Instead o f positing desired power (e.g., .80) and determining n* (= 464), let us instead assume that (for whatever reason) we will be using n = 350 cases. To determine the power, find, using Eq. (4.5.6), L * = .03659 (350 - 9 - 10 - 1) = 12.07; entering Appendix Table E .2 (for a — .05) at row kn = 10. we find that L* = 12.07 fall between L = 11.15 at power = .60 and L = 13.40 at power = .70. Thus, with n = 350 for these specifications, power is between .60 and .70. (Although usually not necessary, if a specific value is desired, linear interpola­ tion is sufficient for an approximation; here it yields .64.) For further illustration, what is the power of a test of R \ . n „ — Ry.D using Model II error at a = .05, where k,, = kl3 = 7, k ■ •= 26, and p

= .06667 (the

second example in Section 4.5.3), for n — 350? From Eq. (4.5.7) we find

L* = .06667 (350 - 26 - 1) = 21.53. Entering Appendix Table E.2 (for a -

.05) at row kn = 7, we find that L* =

21.53 falls just below th e L = 21.84 for power = .95. ( I f wc wish to consider the alternative a = .01, we find in Appendix Table E . I that L* = 21.53 falls slightly below L = 21.71 for power = .85.) W hat power does n = 350 bring to the test o f Ry./yM}, - R y .D ,\4 at ot = .01, where kD - kl} — 7, kD + kM = kA = 19, for p = .09333 (the third example in

1Ml is for such applications that the tables provide for low power values (.10 to 60). When a specified n results in low power, it is useful to have some idea of what the power actually is. See the lasl example in this section

164

4.

SE T S OF IN D EPEN D EN T V A R IA BLES

Section 4 .5 .2 )? Because in this case Model II and Model I are indistinguishable, wc use Eq. (4.5.6) to find

L* = .09333(350 - 19 - 7 - 1)= 30.15, which in the L table for a = .01 (Appendix Table E . 1), row ke = 7, falls above the L value for power o f .95, i.e., 28.21. (Fo r a = .05, power is of course even higher: in Appendix Table E .2 , L* - 30.15 falls above the L = 29.25 value for power = .99.) Fin ally, let us posit for this preceding test that the n is to be 100 (instead of 350). Equation (4.5.6) yields

L * = .09333(100 - 19 - 7 - 1)= 6.81. For a = .01 (for kg = 7), Table E . I shows this L* falling between 4.08 for power = .10 and 8.57 for power = .30; for a specific (but approximate) value, linear interpolation yields power about .22. Things arc somewhat better for a = .05. where Appendix Table E.2 shows L* = 6.81 bounded by 4.77 for power = .30 and 7.97 for power = .50, with linear interpolation giving approximate power o f .43. A s an alternative to this procedure, the analyst may prefer to use the tables provided in Cohen (1977, pp. 416-418). These give power as a function o f L and

kB at a — .01, .05, and .10. These necessitate interpolation between power values for tabled L values, rather than the reverse, a generally simpler procedure. 4.5.8 P o w er as a Function of n: The Special Cases of f l2 and s r2 W e have already seen that, as for F tests o f statistical significance, the determin­ ation of n* for partialled sets is a general ease that can be specialized. In Chapter 3, we considered the determination of n* as a function o f power (as well as a and f 2) for tests on R 2 and s r } Treated as special cases, the method o f the preceding

section, Eq. (4.5.7), can be used to determine the power of these tests as a function o f n. F o r/ ?2, / 2 is given in Eq. (3.8.1) as R 2/( l — R 2}. Sim ply substitute th is/ 2 in Eq. (4.5.7) to find L* and procccd as before, entering the L tabic at row ku = k, the total number of IV s . For at?, where X t is one o f k IV s , / 2 is given in Eq. (3.8.3) as s r } i ( I — R 2). Again, Eq. (4.5.7) is used to find L * , and the L tabic is entered at row k R = ! . 4.5.9 Tactics of P o w e r Analysis W e noted earlier that power analysis concerns relationships among four param­ eters: power, n, a , and E S (indexed b y p

in these applications). Mathematically,

any one of these parameters is determined by the other three. W e have consid­ ered the cases where n and power are each functions of three others. It may also

4.5 PO W ER A N A LY S IS FOR S E T S

165

be useful to exploit the other two possibilities. For example, if one specifics desired power and a for a hypothesis, L is determined. Then, for a specified n, Eqs. (4.5.6) or (4.5.7) can be solved for/-. This is the detectable B S , that is, the population/2 one can expect to dctect using the significance criterion a , with probability given by the specified power desired in a sample o f n eases. One can also (at least, crudcly) determine what a one should use, g iven/2, desired power, and a sample o f size n. Although these are useful forms of power analysis, their detailed consideration is precluded here. The interested reader is referred to Cohen (1965, 1973b, 1977). It must be understood that these mathematical relationships among the four parameters should serve as tools in the service of the behavioral scientist turned applied statistician, not as formalisms for their own sake. W e have in the interest o f expository simplicity im plicitly assumed that when we seek to determine n * , there is only one possible a , one possible value for desired power, and one possible/2. Sim ilarly, in seeking to determine power, we have largely operated as if only one value each f o r a , / 2, and n are to be considered. But the realities of research planning often are such that more than one value for one of these parameters can and indeed must be entertained. Thus, if one finds that for a hypothesis for which a = .01, power = .80, a n d /2 = .04, the resulting n* is 600, and this number far exceeds our resources, it is sensible to see what n * results when we change a to .05. I f that is also too large, wc can invert the problem, specify the largest n wc can manage, and see what power results for this n at a = .05. If this is too low , we might examine our conscience and see if it is reasonable to entertain the possibility that/2 is larger, perhaps .05 instead of .04. If so, what docs that do for power at that given n"! At the end of the line of such reasoning, the investigator either has found a combination of parameters that makes sense in his substantive context, or has decidcd to ahandon the research, at least as originally planned. M any examples of such reasoning among a priori alternatives are given in Cohen (1977). W ith the multiple hypotheses that generally characterize M R C analysis, the need for exploring such alternatives among combinations of parameters is likely to increase. I f //, requires n ' — 300 for desired power o f .80, and 300 cases give power of .50 for H 2 and .60 for //3, etc., only a consideration of alternate parameters for one or more o f these hypotheses may result in a research plan that is worth undertaking. To concludc this section with an optimistic note, we should point out that we do not always work in an economy o f scarcity. It sometimes occurs that an initial set of specifications results in n* much smaller than our resources permit. Then wc may find that when the parameters arc made quite conservative (for example, 4, B) and with partiallcd sets (.4 R, B A ) analogous with those for single IVs. The increments referred to above are squared multiple .Mm/partial correlations (e.g., - R y tn A ,) and represent the proportion of total Y variance associated with B-A. Similarly, we define the squared multiple partial correlations (e.g., p R \ = R \ A ) as the proportion of the Y variance not accounted for by A (i.e ., of 1 — R \.A) which is associated with B-A. These two statistics arc compared and exemplified. As with single IVs, the troublesome area of overlap of sets A and B with Y, area c, cannot be interpreted as a proportion of variance because it may be negative, in which case we have an instance of suppression between sets (Section 4.3).

4.7 SUM M ARY

177

A general test of statistical significance for the Y variance due to a partialled set B-A (that is, of sR„ and, perforce, of pR#) is presented. Two error models for this test are described. Model I error is 1 — Ry.Al), with sets other than ,4 or B (collectively, set C ) ignored. Model 11 error is 1 — Ry.Ailc ., the magnitude o f a correlation with a dummy variable w ill change with the relative size of that group in the total sample, reaching its maximum when the group is exactly half the total sample and declining toward zero as the group’s proportion in the sample declines toward zero or increases toward one. Therefore, the interpretation of any given r Yj (or r £,) depends upon the sampling meaning of p t. (A s we w ill see later, this is true of

kind o f correlation— multiple, partial, or semipartial.)

Consider two sampling circumstances involving religion and A T A . In the first, we draw a random sample of n — 36 cases1 from some natural population, say a Midwestern college, and determine each case's religion and A T A score. Assume that the data of Table 5.3.2 were so obtained. The p t for each group reflects, subject to sampling error, its proportion in that population, and so does, indi­ rectly, the magnitude of the r yj. Thus, for example, the Jew'ish group ( G 3) has = . 167 in Table 5.3.3, and

= .1260. W ere the Jewish group more numerous

in this sample (up to 1 — /;3) as might be the case for some other natural population,2 then, all things equal,

would be larger.

Yet another circumstance resulting in different r yis is a sampling plan where equal numbers of Protestants, Catholics, Jew s, and Others are sampled from

'This is an unnealistically small sample size for an aciu.il study ll is used here so that the complete data tan be economically presented and. if desired, analyzed by desk or pocket calculator. long as its proportion falls between p ; I 167) and 1 p■ ,i .833). its sd will increase: outside these limits it will decrease. See Hq. (5.3.2).

188

5.

NOMINAL OR QUALITATIVE SC A LES

their respective populations and their A T A scores determined. Here these re­ ligions are considered as abstract properties, and their differing numbers in natural populations are ignored. The equal p,s in the resulting data w iil yield different r Yls than those of the previous plan: W e would expect a smaller correla­ tion for Protestants and a larger correlation for Jew s because of the associated respective decrease and increase in the .sc/s o f their dummy variables. Thus we see that values for rYi and r \ t arc not solely properties of means on the dependent variable of group samples, but depend also upon the relative sizes of the groups. In the interpretation of such correlations, p, must be taken into account. Thus, the interpretation from the illustrative data that “ ‘Jewishness’ accounts for 12.6% of the variance in A T A scores at Midwest College” is valid if one understands “ Jewishness” to include the property of relatively low fre­ quency. Sim ilarly, in an equal ni sampling plan that same statement is valid if one understands “ Jew ishness" to be an abstraction, with the low frequency of Jew s in most natural populations ironed out by the fixed equal sample sizes. The latter kind of sampling plan characterizes most manipulative laboratory situa­ tions, where no natural population exists. Groups are then intended to represent abstractions (for example, Treatment 1: animals reared from birth in social isolation) and typically are of equal size. The preceding is an effort to mollify statistical purists who would restrict the use of correlations to the first sampling plan, where a single population is randomly sampled and each case’s standing on the two variables (for example, A T A and religion) determined, which is sometimes called the random model. W e believe, however, that correlations and particularly squared correlations as proportions of Y variance are valuable analytic tools in the sccond kind of sampling circumstance ( “ fixed model” ), provided only that care is taken in the interpretation of correlational results to keep in mind the sampling plan and its implications. Keeping this in mind, we can test any rYi for significance (or set up confidence intervals for it) as we can for any r, as described in Chapter 2 (although this test that uses Model I error is not necessarily the most powerful one). Thus, we use Eq. (2.8.1),

For X 2 in the illustrative example, we find 2.874. V

1 - (-.4427

which, with d f = n — 2 = 34, is significant at the 1% level (Table 5.3.3). This means we can conclude that in the natural population sampled, the correlation for Catholicness is nonzero (and negative). It is also exactly the t value we would obtain if we tested for the same data the difference between the Y means of

5.3 DllMMY-VARiABLE CODING

189

Catholics and non-CathoIics, becausc the two i tests are algebraically identical. W c can also set up confidence limits for such r Yls, and here it is particularly important to keep in mind whether the sampling was of the first or second kind described previously, because the two kinds of sampling imply different popula­ tion distributions of group proportions and therefore, in general, different cor­ relations, as wc have seen. Let us return now to the set of rYt in Table 5.3.3. W e have them for all but the gth group. W e could compute rYf, from the raw data, but it is implicit in the other r Yts and the «,s or ps:

fc 0 ^

(5.3.3)

- rYisdi X rYiV p ,{ i - p,} rY = -------- = ----- ------- --- ' sd, - Pg)

where the summation is taken over the g — 1 dummy variables.3 Thus, the correlation of Others versus non-Others with A T A for the illustrative data is (becausc p K = n^/n — 8/36 = .222) .318{.480) + (-.442)(.433) + .355(373)

\ f 2 2 2 ( l - .222 ) -09367— .4157

.225.

It is worth noting, incidentally, that although Y in the present context is a dependent variable in an M R C analysis, Hq. (5.3.3) is valid whatever the nature of Y; it need not even be a real variable— the formula obtains even if Y is a factor in the factor-analytic sense, unrotated or rotated, with the r yi being factor load­ ings for a set of dummy variables, which may quite profitably be used in factor analysis. 5.3.2 Correlations Among Dummy Variables We have seen that we can determine the proportion of variance that each of our g — 1 dummy variables accounts for separately in J'; they arc .1011, .1954, and .1260 (Table 5.3.3). Our primary interest lies not in these three separate aspects of the research factor G, but rather in G taken as a whole. How much variance in A T A does religion aecount for? Is it simply .1011 -I- . 1954 + . 1260 = .4225? This would be the case if , X 2, were independent of each other, that is, if r !2 = r i3 = r23 = 0. That is not the case, however, as can be seen from Table 5.3.3, nor can it ever be true for dummy variables. These correlations |which arc incidentally coefficients (Section 2.3.4) because dummy variables are dichotomies] give the relationships, in the sample, between such properties as Protestantness and Catholicness. For mutually exclusive categories, such as wc are considering, such relationships are necessarily inverse, that is, negative. If a person is Protestant, he is necessarily non-Catholic, and if Catholic, neces'Wiih equal ;i, in the g groups, this equation simplifies to r YK ~ - 1 r y,

190

5.

NOMINAL OR QUALITATIVE SC A LES

sarily non-Protestant. The correlation is however never —1.00 becausc if a person is non-Protestant, he may either be Catholic or non-Catholic, becausc there are other groups into which he may fall. The size of the correlation between two dummy variables CY,, Xp depends on the number of cases in the groups they represent («,, ny) and the total sample n, as follows: (5 3 4 )4

{

r

,J

=■ — \ i______n'ni_____ = V {„ -

« .)(« - n j)

\ i____ -'^L_____ V (l

so that the correlation between “ Protestantness” running example (sec Table 5.3.3) is

- p .)( 1 - p . )

and ' ‘Catholicness” in our

(.361 )(.250)____ _ 434

- A

(1 - .361)(1 - .250)

Equation (5.3.4), as such, is of only incidental interest. The point worth noting is that the dummy variables, that is, the separate aspects of G, are partly redun­ dant (correlated with each other). Bccause this is necessarily always true, we cannot find the proportion of variance in Y due to G by simply summing their separate r\;. It is at this point that multiple correlation enters the scene. It is designed for just this circumstance: to determine, via R 2, the proportion of variance in Y accounted for by a set of k IV s, taking fully into account whatever correlation (redundancy) there may be among them. 5.3.3 M ultiple Regression/Correlation and Partial Relationships for Dum m y Variables Let us review our strategy: we have taken a research factor that is a nominal scale, G, represented by a set of kG = g — 1 dummy variables (A ,, X 2, . . . . X . , ) and can study the relationship of this research factor to Y by running an M R C analysis of Y on the set of X f representing G. For the illustrative data (Tables 5.3.2 and 5.3.3), we find/?£.c = /?y. , 23 = .3549 and therefore R r o = 123 = -596. W e thus can state that 35.5% of the variance in A T A scorcs is associated with religion in the sample, or that theV? of A T A and religion is .596. Note that R depends on the distribution of the «, of the four groups; a change in

their relative sizes holding their P, constant would, in general, change the R. This dependence on the p, is a characteristic of R, as it is of any kind of correlation, and must be kept in mind in interpretation. W c can test the R for significance by means of the standard formula

( I - R 2)k where k is the number of independent variables. For the present application, where k = kG = g ~ i , ’When all f> groups arc of the same size, Hq. (5.3.4) simplifies to r,j = - l/(# - I), so that if our four groups were of equal size, their dummy variables would correlate - 1/(4 - l| -.333.

5.3 DUMM Y-VARIABLE CODING

(5.3.5)

191

F ~

_ .354944(36 ~ 4) (1 - .354944X3) For d f — 3, 32, the F required for significance at the .01 level is 4.46 (Appendix Tahlc D. 1), hcncc our obtained F is significant. The null hypothesis that religion accounts for no variance in A T A scores in the population sampled can be re­ jected. Further discussion o f the meaning of this test w ill be given in Section 5.3.4. In Chapter 3 it was pointed out that R 2 gives the proportion o f Y variance accounted for in the sample, but overestimates that proportion in the population. For a better estimate of Y variance accounted for by the IV s in the population, R 2 must be shrunk to the R 2 of Bq. (3.7,4):

R>

— l — (l

- R 2)



~ T ~ r ■ n -k - 1

W hen applied to these data,

R 2 = 1 —Cl - .3549) —

32

= .2945,

so that our best estimate of the proportion o f A T A variance accounted for by religion in the population is 29.4%, Here again it is important to keep in mind how the sampling was carried out, because the population being projected is the one implicit in the sampling procedure. W e turn now to a consideration of the rest of the yield of an M R C analysis with dummy variables, using the illustrative data (Table 5.3.4). In considering the partial coefficients (pr,, xr/t p,, B t) associated with each X t, we must first under­ stand the unique role played by the group that is not explicitly coded, the one whose Xj scores are 0, 0, 0. This £th group, G g, is not only not being slighted, it is on the contrary a reference group, and all the partial coefficients in fact turn upon it.

The P a rtia l Correlation o f a D u m m y Variable (pr,) In M R C generally, a p rt is the correlation o f Y with ,V1 with all the other IV s held constant, that is, the correlation between Y and A'( for a suhsct of the data in which the other IV s do not vary. In the specific context of dummy variables, holding the other IV s constant means retaining only the distinction between the ith group and the £th or reference group. Concretely, p r t (= .363) is the correla­ tion between A T A and Protestant versus non-Protestant, bolding Catholic versus non-Catholie and Jewish versus non-Jewish constant (Table 5.3.4). But the subset o f the data for which the latter do not vary includes only Protestants ( G t) and Others (G 4), so that the p a r t i a l l e d X 23 variahlc represents a new dichotomy o f Protestants versus Others, and p r x = .363 expresses the relationship of the

192

5.

NOMINAL OR QUALITATIVE SCALES

A T A stores to this new dichotomy created by the partialling process. In other words, p r l is an expression, in correlational terms, of the difference between the Protestant group and the Other group in A T A scores. Similarly, from Tabic 5.3.4, pr2 = —.145 relates A T A to Catholic versus Other (ignoring Protestants and Jew s) and pry = .423 relates A T A to Jewish versus Other (ignoring Protes­ tants and Catholics). The interpretation of a given pr,, as was true for rYi for dummy variables, must take into account the sampling plan (random or fixed n), because it also depends on the proportions of cach group in the sample. Signifi­ cance testing of pr, is discussed later.

The Semipartia / Correlation o f a D u m m y Variable (srf) The basic general definition of an sr, as the relationship of X, from which ail the other IV s have been partialled with an unpartialled Y holds here but is not particularly helpful. A more useful general property of an sr, is that srf is the amount by which Ry \n...k would be reduced if X, were omitted from the IVs, thus: srf = R y.]23 ...k ~ r y m ... Becausc the gth group has no X

variable and hence no t test for its partial

coefficients, one cannot determine from the usual M R C computer output whether its Y departs significantly from Y. Such a test is, however, available from the other results of the M R C analysis;

*

where i runs from 1 to g — I and the summation is over that range. For the illustrative example, -4(11.41 - 21.23 4- 21.60)

\J548.41 (n

+

-47 12 ------= -1.654, 28.49

+ = 0. For the a ’s of X , and X 3: (J)(0 ) + (|)(0 ) + ( —i)( 1) + ( —i)( — 0 = 0. Also for the coding of X 2 and X 3: ( 1)(0) + ( — 1)(0) + (())(1) + ( ( ) ) ( - ! ) = 0. (A check w ill show that Set il also has this property.) Thus, the strategy of contrast coding is to express the hypotheses of interest in the form of # — 1 different (orthogonal) contrasts, using means of means coding. The M R C analysis then directly yields functions of contrast values and also their significance tests. Although we shall use the Mu, — 1/v, 0 coding throughout, it is worth noting that their multiplication by some constant w for any X, w ill leave their correla­ tions (simple, partial, and semipartial and also its (3,) unchanged, as well as the / value for any of these coefficients. What changes is simply the B ,— it w ill now be 1lw as large as before. One might, for example, want to multiply an X, by a tv that makes the coefficients for that contrast integers, say by 2 fo rX , in Set I or by 4 fo rX 3 in Set II (in these, the resulting integers would all be 1’s). The advantage (slight) of this revision is in the simplification of working with integers (fre­ quently l ’s, although it makes little difference to a computer). A possibly more advantageous value for the multiplier is >vr = uv/(u + v), which convert the I /u, -1/v, 0 codes, respectively, to v/(u + v), ~u/(u + v), 0. The latter result in fl, values that equal the contrast values C of Eq. (5.5.1) (i.e., the differences between means of means of Y). Now assume that we use the coding of Set I (Table 5.5.1) for the illustrative data of Table 5.3.2), replacing the dummy variable coding given there. Thus, all cases in G , are coded i, 1 ,0 fo r X ,. X 2, and X , (instead of 1, 0, 0), all G 2 cases are codcd i, - 1 , 0 , etc. W c present the results of a full M R C analysis using the Set 1 contrast codes in Tables 5.5.2 and 5.5.3. 5.5.2 The R 2 and r's W e note from Table 5.5.2 that R2 = .3549 (and R 2 = .2945) and its F = 5.869, exactly as with any other coding of the nominal scale of group membership for these data; changes in coding have effects on results from individual X,. but the

208

5.

NO M IN AL OR QUALITATIVE SC A L ES

T ABLE 5.5.2 C o r r e la tio n s , M e a n s , a n d S t a n d a r d D e v ia t io n s o f th e

Y Maj Min P-C J -O

X, X2 X,

-.079 .444 .363

1.000 .114 .112

m 81.69

.111

27.49

.488

sd

X,

*

I llu s t r a t iv e D a ta f o r C o n tr a s t C o d in g o f S e t I ( T a b le 5 .5 .1 1

.114

.112

1.000

.013

013

1.000

.111 .774

.056 .621

R 1 = .354944.

'V i

/, ( d f = 34)

.0062 .1971 .1321

-.461 2.889** 2.275*

F = 5.869**

3, 32)

= .294470.

••PC .01. set as a set represents the group membership information and w ill therefore account for the same amount of Y variance as any other set for the same data. Now note the correlations among the contrast-coded IV s , X t , X 2, and .V,, .114, .112, and .013 (Table 5.5.2). None o f them is zero, despite the fact that the a coefficients with which the 36 observations were contrast-coded for group membership arc orthogonal. The rit would all be zero if (and only if) the groups were o f equal size, and they are not. (It may be o f interest to note that when the «, are equal and hence all

= 0, R \, 123 would simply be the sum o f the r ^ . ) W c

stress that the requirement is that the contrast coefficients be mutually orthogo­ nal, not that the X t be mutually orthogonal (or uncorrelated). The condition of “ unequal cell frequencies” is a nuisance in the analysis o f contrasts in A V , but is automatically handled in M R C . Some ambiguity may accompany the interpretation o f the r

None occurs

when only two a values are used in coding an X t. It is then a simple dichotomy, and its

is d e a rly the proportion of Y variance associated with the dichotomous

distinction. X , is such a variable, distinguishing G , and G 2 (coded a) from G ? and G4 (coded —£). Note that no distinction is made between G, and G2 or between G 3 and G 4, they arc treated as groups o f

+ n2 — 22 and n3 + «4 = 14

eases. Thus, X , as such is a function o f the weighted means, so that, for example, the Protestants figure more strongly than the Catholics in the pooled majority religion group in the ratio o f their >i(s, 13:9. The unweighted means o f means contrast, where all groups are treated as equal, is carried by the partialled con­ trast variable, order

an^ w ill he discussed later. W ith this in mind, the zero-

= .0062 indicates that less than 1% of A T A variance is accountcd for

by the pooled majority versus pooled minority distinction, and is not significant

(i = -.4 6 1 ). In X 2 and X 3, we have a trichotomy: 1 , - 1 , and 0 coded values are used. In the partialled results (see following), the 0-coded groups are omitted as we intend.

5.5 C O NTRAST CODING

209

but they play a role in determining the zero-ordcr r Yi (as was also the ease in effects coding). W c would therefore, in genera!, not wish to interpret the unpartialled correlations with Y of such trichotomous contrast variables. 5.5.3 The Partial Coefficients in Contrast Coding

The Regression Coefficients a n d (he Regression Equation It is the partialling process that makes our contrast-coded variables yield unam­ biguously interpretablc results about the contrasts or issues that have been coded into group membership. Specifically, the 5, coefficients in contrast coding are functions o f the values of the contrast between unweighted means of means, and when groups arc codcd 0 they arc in fact omitted. W ith the u means of subset U cach coded I /«, the v means o f subset V each coded — 1/v, and irrelevant groups coded 0, the value of C, the contrast in the sample as defined by Eq. (5.5.1), is given byK (5.5.3) Applying this to the Zt, o f Table 5.5.3, we obtain the contrast values

Thus, C, indicates that the A T A mean o f means o f the two majority groups is 9.82 points below the mean of means of the two minority religious groups. Note that here it is unweighted means of means that are being compared; all groups regardless of n, count equally. Im plicitly, such contrasts treat each o f the groups as characterized by an abstract property, and ignore differences in the group sizes. Thus, “ majority religion”

here is conceived as the direct average of

Protestant and Catholic means, not as the overall average of Protestants and Catholics combined in which Protestants figure more heavily, which is as the unpartialled X , represents them. C, might be thought o f as the net effect of the abstract property “ being in a majority rather than a minority religious group” (although, o f coursc, causality would need to be supported on other than statistislf the original codes have been multiplied by

as described above, !:q. (5.5.3) is unnecessary

bccause C = B for cach .V, so treated. For the Set 1contrasts, multiplication by «v/(« + i’) leaves X, unchanged but changes to i . —1,0. 0 and X 3toO, 0, 1. -1 and results in = 32.64 (=■ C2) and fl.i = 33.38 (= C j). as promised. The Y intercept A remains unattccted by the multiplication Quite generally, the effect of multiplying any X, by a constant results in fl, bcintr divided by that constant.

210

5.

NO M IN AL OR QUALITATIVE SC A L ES

cal grounds). Sim ilarly, C 2 = 32.64 is the net cffect on A T A of being Protestant rather than Catholic, and C , = 33.38 is the net effect o f being a Je w rather than an Other.9 Unlike the results with the unpartialled X 2 and X 3, the groups coded 0 are effectively omitted from the contrast. In keeping with the theme o f ignoring differences in group size, the / intercept

A represents the same reference value as in effccts coding— the unweighted mean of all the group means as in Eqs. (5.4.3) and (5.4.4). Perhaps of even greater interest than the size of the contrast as given by Eq. (5.5.3) is its statistical significance. The formal null hypothesis for a contrast is that when population means are substituted, the population contrast is zero (i.e.. the difference between population means or means of population means is zero). The

value associated with the B , provides exactly the significance test of this

null hypothesis. From Table 5.5.3, we note that the majority— minority contrast is not significant, but that both of the others are. When these B t values and A are combined into the regression equation and the contrast-coded values for each group substituted, they yield, as for al! other nonredundant coding, the group means as shown in Table 5.5.3. The contrast model can be viewed as yielding the Ys o f a group by adding to the mean of the means Y (= A) the effect provided by the group’s role in each contrast. Thus, the Protestant Y, comes about, by adding to the mean of means, 81.90. one-half of the value o f the majority-minority contrast £( —9 .S 2 )(i) = ] —4.91, and the value of the Protestant versus Catholic contrast [(16.32){1) = ] 16.32, but none of the irrelevant value o f the Jew ish versus Other contrast L{16.69)(0) = 01, thus: 81.90 - 4 . 9 1 + 16.32 to = 93.31 = K ,. B, is an ingredient in all the means, bccause all groups figure in that contrast, whereas B 2 and 5 , each figures in only the two means that it compares. As was the case for dummy variable and effects coding, it is noted again that the contrast-coded B , values are not affected by varying sample sizes. Because they are a function o f means or unweighted means o f means, the expected value of a contrast is invariant over changes in relative group size. This is generally (although not universally) a desirable property of B t values in nominal coding, and it is the lack of this property that renders the standardized (3; coefficients of generally little use. as already noted.

The Sem ipartiaf a n d Partial Correlations o f a Contrast-Coded Variable Whereas the jB, values express contrasts in units o f Y, the semipartial and partial correlation coefficients and their squares are “ pure” or dimensionless numbers. The use o f such measures o f effect size has the advantage o f working

^Again, wc disavow the causa! implication of the term effect. The reader is invited to take it as a mathematical metaphor and may prefer to substitute the more neutral word difference. Causa! interpretations are never warranted by statistical results but require logical and substantive bases (See Section 3.8 and Chapter 9.)

5.5 C O NTRAST CODING

211

TABLE 5.5.3 Partial and Semipartial Correlations and Regression Coefficients for the Set I Contrast Coding of the Illustrative Data Xi

Pr ,

pr]

srj

sr]

Vi

Bj

ti (df= 32)

X, X, X3

-.209 .494 .423

.0437 .2440 .1789

-.172 .452 .375

.0296 .2045 .1404

-.1742 .4593 .3770

-9.82 16.32 16.69

-1.212 3.213** 2.639*

A = 81.90 = Yj Y = B ,X ,

+ B ,X ,

+ B ,X 3

+

A

= -9.82 X , + 16.32 X , + 16,69 X 3 + 81.90. + 16.32(1) + 16.69(0) + 81.90 =

Y , = -9.82(f)

+ 16.32 (—1) + 16.69(0) + 81.90 = 60.67 = Y t .

Y, = -9.82(- f-) + 16.32(0) + 16,69(1)

II

y, = -9.8 2 (f)

+ 81.90 = 103.50 =

Y t = -9.82(- \-)+ 16.32(0) + 16.69(-1) + 81.90 = 70.12 = * P C . 05.

**/■ ?2. The Jewish-Other contrast gives s r | = .1404, s r = .375. Note that the s r 2 s do not sum to R 2y m , because, given that the «,s are not equal, the ru -f

0

.

212

5.

NOMINAL OR QUALITATIVE SCALES

This leaves for consideration the partial correlation for contrast-coded Xf. The prj is the proportion of that part of the Y variance not accounted for by the other contrasts that is accountcd for by contrast i. Thus, pr\ = .2440 indicates that contrast 2 (the Protestant-Catholic distinction) accounts for 24.4% of the Y variance remaining after the variance due to contrasts 1 and 3 has been removed. This is necessarily larger than sr\ = .2045, because the base of the proportion of the latter is larger, namely all the Y variance. (Recall that in all M R C , pr} & srf, the equality holding when the other IV s account for no variance.) The choice between sr and pr depends, as always, on what seems to be the more appropriate interpretive framework, the total Y variance or the residual Y variance after the effects of the other variables have been removed. In the absence of other consid­ erations, it seems more reasonable to use the former base when group member­ ship is defined by a naturally varying ( “ organismic” ) variable (i.e., other sources of variance arc always present) and the latter base when it is defined by experimental manipulation (i.e., other sources of variance need not be present). Thus, with G defining religion, sr might be the preferred measure. The statistical significance of a t ,- and pr, is the same as that of B t, as always. In summary, when contrast codes arc written as a function of differences between means or between unweighted means of means, a simple function of fl, yields the value of contrast i in units of Y, a t ? and p rje x press the contrast in terms of proportion of Y variance, and srt and prt in terms of correlation with Y. 5.5 4 Contrast S e t II: A 2 x 2 Factorial Design W e can use our running example of four groups to illustrate the contrast coding that yields the results of a 2 x 2 factorial design in A V . W c continue to use exactly the same (artifical) data as throughout, but now reinterpret our four groups as experimental animals subjected to the Treatments D: Drug versus Placebo, and F: Frontal Lesion versus Control Lesion, the dependent variable Y being a measure of retention error (see Table 5.5.1 preceding). As throughout this chapter, we retain the same unequal ns, in order, 13, 9, 6, and 8. Although such an experiment would ordinarily have been planned for equal ns, we assume for the sake of generality and realism that data have been lost due to animal mortality or other reasons. With the groups designated as in Table 5.5.1, the first “ main effect" contrast is for Drug versus Placebo, a contrast between the mean of means of the (u —) 2 Drug groups (G , and G 2) and the mean of the means of the (v =) 2 Placebo groups. This can be stated in the form of Eq. (5.5.1), and results in the I/m, - l/v contrast coefficients (a) as described in the previous section: 2 , —i, -A. These are the coded values that constitute X , for Set II in Table 5.5.1. (This is, coincidentally, the same as the first contrast for Set I.) Sim ilarly, the second main effect contrast is for Frontal versus Control Lesion. For this contrast, too, u = v = 2, but the combination is C’ | and C\ versus G2 and G 4, and the a coefficients are, in order, —i, J, —i. These are entered as the

5.5 CO NTRAST CODING

213

coded values f o r X 2 (Table 5.5.1. Set II). Note that the orthogonality condition for the coded values of X i and X 2 is satisfied: (i)Q ) + ( J ) ( - i ) + ( - 2 X 2) +

( —*)( —i ) = I — i — 4 + -1 = 0. W e have codcd the two main cffccts in what is now clearly a 2 X 2 factorial design A V (with unequal, in fact, disproportionate ccll n,s). The A V correctly leads us to expect that the remaining single d f ( o f “ between cells” ) carries the Drug x Lesion interaction. The multiplication sign in the conventional symbols for interactions is neither arbitrary nor accidental. Throughout M R C , whether we arc dealing with nominal or quantitative IV s or combinations of these, interac­ tions are carried by products o f variables (or variable sets).10 W e accordingly code X 3, the interaction contrast, by computing for each group the product of its X , and X 2 codes: (4 )(i) = I (4 )(- 4 ) = < -4 Xi) = and ( - * ) ( - * ) - 1 (Table 5.5.1, Set II, X 3). Applied to the four means, these are necessarily a coefficients satisfying Eq. (5.5.2) and define a contrast in the sense of Eq. (5.5.1), as do those o fX | a n d X 2, the two main effects. Moreover, given that the main cffccts coefficients arc orthogonal, they w ill ncccssarily he orthogonal to their interaction products, for example, X , w ith X 3: (4)(.i) + ( j ) ( —J ) + ( - 2 ) ( ~ i ) + ( - * ) ( * ) = 0. W c have, then, as given for Set II in Tabic 5.5.1, a full (k(i = g — 1 = 3) set of coded mutually orthogonal contrasts, representing the two main effects and the interaction of a 2 x 2 factorial design. N ow , assume that we use this coding for the illustrative data o f Table 5.3.2, replacing the dummy variable coding of the X j given there by the Tahlc 5 .5 . 1 Set II coding, for example, G 3 cases are codcd

~Li,

instead o f the 0, 0, 1 dummy variable codes. The results of a full M R C

analysis using the 2 x 2 factorial contrasts are given in Tables 5.5.4 and 5.5.5. Tabic 5.5.4 shows, cxactly as in cach previous coding for the same data, that

R y .,2 3 = .3549 and its F = 5.869 (df 3, 32, P < .01). Again we reiterate that coding variations affcct the results from individual X (, not those o f the set o f X t taken as a whole. A little over one-third o f the variance in retention error is accounted for by experimental group membership, an amount significantly dif­ ferent from zero. A s in the previous section we note the orthogonality of our contrast coefficients (Table 5 .5 .1) does not result in the 36 coded cases’ intercor­ relations of the X j being zero, becausc the n,s are not equal. The zero-order correlations o f the contrast variables with Y in Table 5.5.4 im plicitly define contrasts differently than do the partialled values in Table 5.5.5 below. The former do not distinguish in X , between the 13 animals in the DrugFrontal Lesion group and the 9 in the Drug-Control Lesion group, but simply treat them as a single Drug group o f 22 eases codcd i, contrasted with a single Placebo group o f 14 eases coded —1. Any effcct due to Frontal-Control or any interaction effect contaminates the zero-order correlation ryi, becausc the cclls "T h e implications for data analysis using M R C of this statement arc far-reaching. See Chapter S. which is devoted to interaction.

214

5.

NOMINAL OR QUALITATIVE SCALES

TABLE 5.5.4

Correlations, Means and Standard Deviations of the Illustrative Data for 2 X 2 Factorial Design Contrast Coding (Set II. Table 6.5.1) Contrast

Y

Drug-Placebo Frontal-Control D X F

-.079 .570 .120

X3

m sd

81.69 27.49

X,

X,

X,

1.000

.158

.158 .019

1.000

.019 .216

.216

1.000

.111 .488

.028 .499

.042 .246

= .354944

2 r Yi

l i W = 34

.0062 .3249 .0144

.461 4.045** .705

F = 5,869** (’ to v (where the values of v are all different), an equation of the following form will define a function that fits these points exactly: (6.2.1)

y

=A +Bv + Cv2 + Dv3 + ■+ Qvq~ \

A, B, . . . , Q being constants. Now this is an equation that relates one variable to y, and, because that variable v is raised to successive integer powers beyond the first, the equation is clearly /i»/ilinear. Now, consider the standard regression equation (3.5.1)Y = B xX, + B 2X 2 + f l , X s + ■■

+Bk Xk +A.

This equation relates k variables (X, to X t) to Y, and, because the X, are all to the first power, and there are no product terms such as XX,- this is a multiple linear equation. Now, bceause there are no practical restraints on how we define our Xh we use the trick of letting X , = v, X 2 = v2, X-, = v3, . . . , X k = vk , and now what had been a nonlinear equation for one independent variable has become a linear equation for k independent variables, that is, a multiple linear regression equation of the appropriate form for our system. This bit of legerdemain is not a mere verbal trick.. In order to accomplish linearity, we have undertaken multiplicity of the X,, but the price is cheap. By

6.2 POWER POLYNOMIALS

225

bringing nonlinear relationships into the fold in this way, we make it possible to determine whether and specifically how a relationship is nonlinear and to write an equation that describes this relationship. Now let us back up and consider the number of data points we wish to fit. For such a V as IQ, there may be more than 50 different values of V; hence, the complete polynomial [Eq. (6.2.1)] would have over 50 terms. We would be reluctant to spend that many df (one for each X,) to relate Y to V, and, fortunately, it is neither necessary nor desirable. Indeed, we do not seek to fit each squigglc of the curve due to random sampling or measurement error; wc wish to fit a smooth curve with as few X t as are needed to approximate the true function. For most behavioral science data, certainly for the data of the “ soft” behavioral sciences, the first two or three powers will suffice: in the terminology introduced in the preceding section, wc can represent in polynomial form most quantitative variables where nonlinearity is to be allowed for by a set V made up of (6.2.2)

X i = v (linear),

X 2 = u2 (quadratic),

X? = u3 (cubic),

a total of k = 3 aspccts of V. Very often it will be sufficient to use only the first two terms; rarely, one might wish to include more than three. Using the X ( aspects representing V, one analyzes the data by M R C using the hierarchical procedure (Scction 3.7.4) for significance testing, and the simul­ taneous model (Section 3.6.1) for plotting the curve. The process is made clear by the two concrete examples that follow. 6.2.2 An Example: A Quadratic Fit Figure 6 .2 .1 shows a bivariate plot of variables Y and v forn = 36 eases. Ignore for the moment the line and curve— simply consider the gestalt of the 36 points. They suggest a curve, possibly due to an asymptote or “ ceiling effect” in Y— increasing v beyond approximately 100 does not seem to be associated with further increases in Y. When we compute rYv, we find it to equal .767, with t = 6.970 (P < .01). The linear correlation is high, but the fact that it is significant means only that the linear correlation in the population is very likely not zero, and not that the relationship in the population is necessarily adequately described by a straight line. Thus, the best-fitting straight line or linear aspect of V makes it possible to account for .5883 {= r£v) of the Y variance. The resulting linear regression equation turns out to be y , = ,8398 i»+ 14.49. This line is easily plotted by substituting one high and one low value of v, finding their respective Ys from this equation, plotting the two v, Y points and connccting them with a straight line (for v = 40, Y l = 48.1, and for v = 140, Y t = 132.1); see Fig. 6.2.1. Despite the large r, there are quite a few unbalanced points in the middle of the v scale above the line and several more points with low v below it. Can matters be improved by taking into account the quadratic aspect of V‘>This can be determined by means of hierarchical analysis: with X ? = v2 added to =

226

6.

QUANTITATIVE SC A LES

F IG U R E 6 .2 .1

P o ly n o m ia l reg re s s io n o f

Yon V.

v, a second M R C analysis is performed, and the increase in R 2 y n overtf^., (= /■£,) is noted. This is literally .«•? (the squared semipartiai correlation for X 2 partialling X ;) , but with the hierarchical method it is convenient to use the general symbol I with subscript to denote an increment in R2 due to the addition of one ore more variables. This 12 value is determined and tested for significance to determine whether, in the population, the inclusion o f this term increases/?2. In the upper section of Table 6.2.1, both the ingredients and results of this operation arc given. First, however, note that these are most certainly not orthogonal variables: v and v2 are very highly correlated with each other (.990), and also with v3 (.965 and .992). This is a feature of polynomial regression using powers of v: when, as is usually the case, v is a typical variable made up of positive values, the powers of v used as polynomial terms are always highly correlated. It would be a mistake to conclude from r i2 = .990 that X , and X 2 arc functionally equivalent. To be sure, .9902 = .9801 of the variance in X 2 is linearly accounted for by X , . and only I — .9801 = .0199 of the X 2 variance is not, but what is operating in the hierarchical method when X 2 is brought in after X , is this latter /mcorrelated variance; in other words (and as always), it is the partialled X 2. { , that is, v2 from which v is (linearly) removed, that represents the pure quadratic variable (Cohen, 1978). It may be only 2% of X 2, but it is 100% of X 2 i . Table 6.2.1 shows the proccss. First, X , by itself accounts for .58828 of the Y variance, as we have seen. In conformity with the plan, it appears in the cum R 2 column (cumulating from zero variance accounted for by no prior IV s ) and is

6.2 PO W E R P O L Y N O M IA L S

227

T A B L E 6.2.1 Polynom ial M R C Analysis of Regression of Y on V Hierarchical mode!

cum R 5

IV s W

.767 J.000

X,

< ^)X 2 (S )X ,

.990 .965

.725

.990

5.000 .992

.675

.965

.992 1.000

m

83.09 81.69 7422. 74J389.

sd

30.10 27.49 4948. 718223.

df

I

Fi

X,

.58828 48.580** 1,34 .58828 48.580**

X , , X^

.65076

30.745** 2 ,3 3 .0 6 2 4 8

5.904*

1,33

X ,, X ,,X,

.65079

19.878** 3 ,3 2 .0 0 0 0 3

.003

1,32

n = 36

Simultaneous model Linear equation: r,

rB,

= ,8398/tfj + 14.49 6.970**

d f = 34

Quadratic equation: = 2.793

tB .

X , - .01096

(3 .4 4 1 **)

- 63.77

df=

-2.430*

33

Cubic equation: r s = 2.596

tBj

X , - .008654

(.650)

< .05.

^ - .0 0 0 0 0 8 4 7 6

(-.188)

X 3 - 58.48

d f = 32

-.050

* * P < .01.

significant, as we also have seen, the F — “18.580 simply being the square o f the already noted / = 6.970. W h en X 2 is added to X , , the cum R 2 (now R y . i2 ) is .65076 w hich, when tested by the standard test for the significance o f an R 2 (Kq. 3 .7 .1), gives F = 30.745 (P < .01). This merely means that in the popula­ tion, the use of X , and X 2 w ill result in a nonzero R 2. Our interest rather focuses on the increment in R 2 due to the additon o f X 2, that is, I 2 = R y . , 2 ~ R y i = .65076 — .58828 = .06248. Thus, an additional 6 .2 5 % of Y variance has been accounted for by the quadratic term X 2.,. T o test whether this increment is significant, we specialize the test for the significance of the increase in R 2 o f an added set B to that of a set A , using M odel I error, already given: (4.4.2)

R v -a b - R y2 -a F =— 1 ~ R Y ‘AB

df

x n

kB

1

( d f = kB , n - k A - k B - 1).

In this general formula, kA is the number of IV s in set A and kti is the number o f IV s in set B. W e specialize it for the hierarchical polynom ial by noting that Ry.An — R ^ a — l R — l t for some specific X t added term, R \ Ati is the cumulative

1,3 4

228

6.

QUANTITATIVE SC A L ES

Ry )2 i UP through that term, and kH = I. Thus, specializing Eq. (4.4.2). the significance of the increment due to a single added polynomial term X , is 1)

(6.2.3)1

n -i-1 ).

1 - R h 12..., Because, as has been pointed out, /, is sr(2with all lower-order terms partialled, this F is simply the t2 for the partialled coefficients for X,. Most computer programs provide t (or F) for each B, including that for the highest order, B t, making hand computation o f Eq. (6.2.3) unnecessary. Applying Eq. (6.2.3) to the increment, wc find

r = .06248 (36 - 2 - l ) = 5904 1 - .65076 which is significant (F < .05). Thus, F indicates that the population R 2 using v and v2 is larger than the one using v alone; or that the quadratic aspect o f V accounts for some Y variance (over and above the linear aspect). This in turn means that the relationship in the population is not straight line, but has (at least) one bend in it. Equivalently, we note that i f 2 = — -01096 and its / = 2.430 ( = V T 9 0 4 ) , P < .05, as before.2 N ow , using the B 's and A from the computer output, we can write the quadratic regression equation (Table 6.2.1)

Y2 = 2.793 X} - .01096 X 2 -63.77. This equation gives the quadratic curve that best fits the 36 points in Fig. 6.2.1. If a plot is desired (it need not be necessary— see Scction 6.2.4), it can be accomplished by substituting the v values 40, 50, . . . , 140 into this equation with v fo r X, and v2 for X 2, solving cach time for Y2, and drawing a smooth curve through the resulting v, Y2 points. For example, at -

v

= 40, V2 = 2,793 (40)

.01096 (402) - 63.77 = 30.4; at v = 50, ? 2 = 2.793 (50) - .01096 (502) -

63.77 = 48.5; . . . ; at v = 140, ? 2 = 112.4. This curve is plotted in Fig. 6.2.1 and permits comparison with the fit provided by the straight lino of the linear equation. It follows the track of the points that invites the eye, clearly doing a better job at low and middle values of v than the straight line. Note that in order to write the regression equation we have shifted from the hierarchical to the simultaneous model o f M R C . We do not take the B t from the

1Because this lest is for error Model 1(see Section 4 4), it is negatively biased (conservative) when terms of higher order than X, are making real contributions to the population R2. When the latter is not the case, this test is statistically more powerful than other tests (Scction 4,5), See discussion of alternative error models for polynomials in Scction 6.3.2. Also, note that this equation is equivalent to that of the test for .yr?(Eq. 3 6 6). 2Do not be surprised that although is so small, it is nevertheless significant. Rememher that X 2 = v2, a four or five digit number: even so small a weight results in a large component in Y. For example, when v = 100, V, = v2 = 10,000, and B 2X 1 = -.01096 ,!h order will be sufficient as well as necessary, if the data points being fitted arc relatively free of sampling and measurement error, a polynomial of order higher than

k

may materially and significantly improve the fit This is

bccause the additional bends it provides w ill occur outside (he range of > we are studying (mathe­ matically, r stretches from minus to plus infinity), whereas within that range the proper number

(k -

1) of bends are provided for, and the function fits the points better than (hat of the ith order equation

234

6.

QUANTITATIVE SCALES

gives a minimum and the higher a maximum, and, if positive, the reverse. Because for this example B } was positive and significant, B2 , not significant, and By .)2 negative and significant, the function can be described (without plot­ ting) as generally rising with a minimum followed by a maximum— the shape of a forward-tilted S. Maxima and minima of polynomial regression equations are of interest in some applications, but they need not be routinely determined. They also require some caution in interpretation. In the quadratic regression problem (Section 6.2.2 and Fig. 6.2. i), for example, wc found the Y maximum to come at v = 127. It would however be an error to conclude that in the population, increases in v beyond 127 are accompanied by decreases in Y. Were the single data point v = 140, Y = 116 omitted, the value of vM would be displaced beyond the upper limit of our range of v, where it may be completely fictive (for example, an IQ of 200). In the cubic regression example (Section 6.2.3 and Fig. 6.2.2), the observed displacement of the maximum certainly limits our confidence in the projectibility of such values to the population. W e are unable to offer any general guidance about the size of the error in maxima and minima due to sampling and inadequacy of fit that is to be anticipated; one must rely on one’s substantive knowledge of the field of application, leavened with some common sense. The possible wobble in vM and accompanying YM values should not be allowed to detract from the areas of solidity of the method: 1. A data plot with k — I real bends will require a polynomial of (certainly no less than and probably no more than) the jtth order. 2. By noting the size and statistical significance of each /,, and the sign of B, when it enters the equation, the general shape of the regression can be known. For many applications, this makes physically plotting the curve unnecessary.

How Many Terms? It is difficult to offer general advice as to how high k is to be in the representa­ tion of V by a polynomial of the fcth order. There are several interrelated reasons for this difficulty. Behavioral science is an exceedingly heterogeneous domain, covering areas as diverse as brain biochemistry and cultural anthropology. This heterogeneity, in turn, is accompanied by diversity in the metric quality of the data and in the precision of research formulations and hypotheses, which then will be related to the purpose of an investigator when the polynomial method is used. It hardly needs to be pointed out that numbers issuing from paper chromotography of a substance found in the brain and those from attitude surveys of welfare mothers are of quite different character, yet computer programs do not distinguish between them. Finally, purely statistical considerations (n, number of observed values of V7) which vary widely from research to research also play a roie in setting or finding k.

6.2 PO W ER PO LYN O M IA LS

235

Remember that with n observations of Y, each associated with one o f q differ­ ent values of V, a polynomial in q — I terms w ill exactly fit the Y means of all these V values. Concretely, if we have 1CX) (= n) paired observations of grade point average (Y) and IQ (V ), with 36 (= q) distinct values of IQ , the 36 y means o f these IQ values w ill be perfectly fitted by a polynomial equation in powers o f v up to v35. Do wc want to fit this equation? Most assuredly not, and for several reasons. One obvious reason is that this perfect fit w ill hold for only this sample. Consider that with 100 eases distributed over 36 points, the Y means for each V value arc based on an average o f 100/36 = 2.8 cases. M any w ill be based on only one case. Obviously such means w ill be wildly unstable as estimates o f their population values. O f what use is it to fit them exactly? The 35 B, and A of the 35th-order polynomial equation w ill similarly be grossly unstable estimates of their population values. For the population equation, almost all the 35 £f; values w ill be zero or negligible. I f our n were 10,000, and, assuming for the sake o f simplicity that q is still 36, the average number o f cases on which the 36 Y means are based would be 278, and only a few would be based on such a small number as to render them unstable. Do wc want to fit the 35th-order polynomial now? Still not. Even if we go further and assume that these 36 means coincide perfectly with their popula­ tion values so that the sample equation coincides perfectly with the population equation, would it be desirable to have it? It would not, because the curve through the means would still not represent the relationship between the two constructs understudy, owing to the inevitable departures, however small, o f the

Y and V scale units from perfect equality. Quite apart from questions o f sample size and adequacy o f scaling, consider: Is there more information in the 36 constants we determine from the equation than in the 36 means that they fit? No, if anything, less: The means are easier to interpret. Underlying this argument is the traditional scientific strategy of par­ simony. If it takes 36 numbers to account for 36 other numbers, no explanation (order, theory) has been accomplished. W hat we want, then, is that the order o f the polynomial k be smaller, usually much smaller, than q — 1. Given that curvilinearity is to be provided for or investigated, it is difficult to think of circumstances where k should exceed four, and very frequently two are sufficient. W ithin this range, a large number is likely to be employed; (1) when one’s purpose is the precise fitting of the curve; (2) when the data arc relatively free o f error (that is, small dispersions about the Y means of the V values relative to the overall Y variance); (3) when n is very large relative to the total number o f IV s to be studied (say, a ratio o f more than 20 to 1). In contrast, a i of 2 or maybe 3 w ill suffice: (1) when one’s purpose is to allow for or detect curvilinearity in a relationship rather than to closely specify it; (2) when measurement error is substantial (rating scales, interview responses, most psychological tests and sociological indices); or (3) when n is not large

236

6.

QUANTITATIVE SC A L ES

relative to the total number o f IV s (say, a ratio of less than 10 to 1). In general, the “ less is more” principle o f Section 4.6.2 applies.4 In our exposition and examples, wc have used k — 3 bccause it is in the middle of our range, and the strategy was that k was set in advance o f the analysis, implicitly as a hypothesis that no higher order was needed. An alternative strat­ egy may be employed in which one proceeds hierarchically and evaluates each I until a statistical decision rule halts the procedure. The following rules or com­ binations thereof may be employed: 1. Statistical significance.

Starting with the linear term one proceeds

cumulatively until an lj is reached that fails to meet the significance criterion that is being employed (usually P < .05). The process is terminated, and k = j ~ 1,s 2. Change in R2.

Thus far in this chapter, we have been concerned only with

the observed/?2 and increments to it. However, wc know that the addition of any IV to a set, even a random, nonprcdictive variable, w ill result in an increase in

R 2— a decrease is mathematically impossible, and I exactly zero exceedingly rare. This fact of M R C should not be lost sight o f in its application to the polynomial method. It can be instructive, while cumulating, to track the size and particularly the change in the shrunken R 2, the estimated proportion of variance in Y accounted for in the population by a polynomial o f that order. Unlike I. these changes may be positive or negative. For example, the R 2s of the quadratic example (Scction 6.2.2) are, successively, .576, .630, and .618: R 2.\2?, smaller than R$. )2. T w o stopping criteria use changes in R 2: 2a. N o increase in R2.

It can he proven that whenever the F ratio for an Ij is

less than one, R 2 for the jth order w ill be smaller than R 2 for the (j — l)th order, that is, a decrement w ill occur. If one is tolerant o f relatively large k in the interest o f a close fit, one may take successive terms until R 2 drops or fails to increase. How ever, this may easily lead to too many terms, because under conditions of no increase in population R 2, F will exceed 1 with a probability greater than .32. This procedure, then, is appropriate when one is seeking a maximum true fit, and is w illing to pay the price o f large k and the risk of overfitting. 2b. M in im u m increase in R2.

W hen maximum fit is not the goal, a reason­

able criterion for stopping is failure for R 2 to increase by some arbitrary mini­ mum consonant with the purpose of the investigation, ranging between, say, .02 and .05. This criterion wouid ordinarily be used in conjunction with a signifi­ e d another practical consideration is the degree of accuracy provided by the computer program used. Large k, even k = 5, requires a degree of computing accuracy not attained by many standard M RC programs { Wampler, 1970). Sec Centering, and Appendix 3. 5In the unlikely circumstance that one is dealing with a symmetrical or inverted ll-shaped curvc. one should not terminate after the linear term; similarly, and even less likely, if one is dealing with a symmetrical trigonometric wavelikc function with one maximum and one minimum, one should not terminate with k = 2. It happens that our cubic example (Table 6.2.2) is such a case: /> is not significant, but /i, is.

6.2 PO W ER PO LY N O M IA LS

237

cance test on /. that is, one stops either when / fails to be significant, or when the increase in R 2 fails to meet the criterion minimum. In much of the research done in the large social science sector of the behavioral sciences, the actual fitting o f curves is not necessary, or even appropriate. In these areas, a multiplicity of causal factors not represented in the equation plus measurement and scaling errors typically operate so as to preclude accounting for more than about one-third of the Y variance, if that much. Departures from linearity here are likely due to “ ceiling” or

Hour’ ’ effects. These may be real,

or due to Sack o f sealing fidelity. O r, the latter may praducc other departures from linearity. In any case, the nonlinearity is simple, and V w ill be well represented using only v and v2 (and possibly v3 may make a nonnegligible contribution). In the “ harder” sectors of behavioral science (for example, psy­ chophysiology, sensory psychology, learning), if a polynomial fitting function is to be used, v3 w ill more often be needed additionally (and possibly v4 may make a nonnegligible contribution). There are other considerations that lead to a preference for k to be small. With reasonably large n, two or three additional IV s due to larger k in V may not be serious, but, as w ill be seen in Chapter 8, when wc are interested in interactions of V with other sets representing quantitative or nominal scales, each interaction requires IV s equaling k times the number o f IV s in the other set(s). A generous value for k easily results in a substantial increase in the total number of IV s in the analysis, with the attendant increase in instability and loss o f power (Section 4.6). Also, problems o f computation accuracy and stability may arise with a polynomial where k is as much as 4 or 5 (see Centering). In many investigations, our major purpose is to adequately represent the con­ structs in our IV s as they relate to Y, rather than fit a function. Thus, when for a construct like age, length o f hospitalization, number of siblings, or socio­ economic status we say that it accounts for 12% o f the Y variance, we want it to mean that the construct has that given degree o f effect or association. I f it should actually be the case that this 12% is accounted for linearly, while the addition of the quadratic aspect o f the variable would raise the Y variance accounted for to 18%, we w ill have understated the association o f our construct and cheated ourselves and our readers o f a complete understanding of the import of our data. Such blunders are easily avoided by the use o f polynomials of low order. This should not be taken as a blanket recommendation to adopt the slogan “ No without its v2!” but rather to be prepared to cope with nonlinear relationships when they arise or when there is good reason to suspcet them. No research prescription can replace, let alone override, the experience and judgment of the investigator.

Centering W c have seen that the correlations among the powers of v run high. W ith more than two terms, tbc multiple R 's of any power with the others run even higher.

238

6.

QUANTITATIVE SCALES

Theoretically, no matter how high they get, as long as such R 's do not reach 1.000 (and, theoretically they cannot), the matrix of IVs is not, in matrixalgebraic parlance, singular, and, in theory, the M R C can proceed. In practice, however, there is a limit on the accuracy provided by the given computing algorithm (recipe) used by the program and realized on the computer. Asthe/f’s among the X, get very large and very close to 1, the matrix of rt/ is said to be “ highly multicollinear” (or “ ill conditioned” or “ near singular” ), its deter­ minant approaches zero {see Appendix 1), and the computing goes haywire. The program may “ bomb,” or producc garbage, or worse— give results that look reasonable but actually are not even correct to one significant digit (Wampler, 1970). Most conrect M R C software, however, issues a warning when it encounters this threat to its accuracy and refuses to proceed. When this occurs, consultation with the technical staff of the computer laboratory may solve the problem, usually by shifting to a program of greater computational precision (see Appendix 3). Another conscquencc of high multicollinearity among powers of v, particularly when k is large (say, 5 or more), is that the regression coefficients may bccome very unstable— slight changes in v may produce large changes in the regression coefficients. The problem here is not one of computational accuracy for the sample at hand but rather of sampling stability of the coefficients, the problem noted when the effcct of R f on the size of the standard errors of B and p (Eqs. 3.6.8 and 3.6.9) was discusscd. 8oth problems of accuracy and stability may be at least partly solved by reducing the size of the s among the powers of v and hence of the Rt. This can be accomplished without any loss or change in information by the simple expedi­ ent of “ centering” v on its mean (i.e., by replacing v by its deviation from the mean, v' = v — v, and powering v' for the polynomial). (Transforming v to standard scores, zv and powering zv works just as well becausc this also centers v about its mean.) The effect of centering is to reduce those a -sharply where i is an even-numbered (2, 4) and j an odd-numhered (1 , 3) pow er,hcnce reducc their

R ' s, and thus improve the accuracy and stability of the output. Centering docs not entirely solve the problem of multicollinearity when the distribution of v is sharply skewed (Budescu, 1980) and k is large, but we have noted the rarity of the need for large k in most behavioral science applications. Please note that although large R, increases the absolute size of the standard errors of regression coefficients, it does not invalidate their significance tests (Cohen, 1978), which is the usual concern of the analyst, rather than their absolute values. Still, centering docs no harm and is fairly easily accomplished. Finally, the problem of high multicollinearity may generally be solved by going from power polynomials to coding by means of orthogonal polynomials, as described in Section 6.3.3. flWhcn v is symmetrically distributed about its mean, these r^s become zero, but even with readily observable departures from symmetry they are quite low.

6.2 P O W E R PO LY N O M IA LS

239

Scaling Quite deliberately, no attention was given to the nature of the quantitative scales used in the illustrative examples. The issue of the level of scaling and measurement precision required of quantitative variables in M R C is complcx and controversial. W e take the position that, in practice, almost anything goes. Formally, fixed model regression analysis demands that the quantitative inde­ pendent variables be scaled at truly equal intervals and measured without error. Meeting this demand would rule out the use of all psychological tests, sociologi­ cal indices, rating scales, and interview responses; excepting some experimen­ tally manipulated “ treatments,” this eliminates virtually all the kinds of quan­ titative variables on which the behavioral scicnecs depend. Such variables have no better than approximately equal intervals and at the very least 5% or 10% of their variance in measurement error. Regression models that permit measurement error in the IV s exist, but they are complcx and have even less satisfiable requirements, for example, multivariate normal distributions. The general conviction among data analysts is that the assumption failure entailed by the use of fallible IV s in the fixed-regression model, which we use throughout, does not materially affect any positive conclu­ sions that are drawn.7 Naturally, if the construct V is measured by a test with a reliability of .20 (hence, with 80% measurement error), a conclusion that Y variance is not being significantly accounted for by V, whatever the order of polynomial we use from first to fcth, may mislead us. But this is not a problem unique to regression analysis— reaching negative conclusions about constructs from unreliable or otherwise invalid measures is an elementary error in research inference. The inequality of intervals in V demands further scrutiny. W e must first dis­ tinguish between randomly unequal intervals and systematically unequal inter­ vals. Ordinary crudcness of measurement in interval or ratio scalcs results in randomly unequal intervals. For example, for a given psychological test that yields an IQ score, the unit 106-107 may be slightly larger or slightly smaller than either the 105-106 or 108-109 units. There is no systematic relationship in the size of adjaccnt unit intervals and, by extension, intervals made up of adja­ cent groups of units, for example, 95-99, 100-104, 105-109. Such units or ranges of units may not reflect identically equal intervals on a true measure of the construct (which, of coursc, wc cannot know), but there is no reason to believe that the intervals are more or less equal in one part of the scale than another. Another way to say this is that the serial correlation (Harris, 1963) of cach interval size with the next over the entire scale will be about zero. 'This statement is false if taken out of the present context of the regression on Y of one research factor V. Measurement error in multifactor MRC may lead to seriously misleading partial coeffi­ cients. Sec Section 9.5.

240

6.

Q UAN TITATIVE SC A L E S

Such random lack of fidelity in scaling characterizes the indirect approach to measurement of psychometrics and sociomctrics: test and factor scorcs and in­ dices— generally, measurement accomplished by combining elements. It is a tolerable condition when, as seems to be generally the case, the inequality of interval size is small when expressed proportionately, say when the mean inter­ val size is three or four times the standard deviation of interval sizes. The effect that it has is to produce jaggedness in the observed function that is additional to that produced by sampling error. Because, as argued above, we do not seek to fit such jags with polynomials of high order, they merely contribute to the size of the proportion of residual variance, 1 - R 2. Compared to other factors, this contribution is typically a relatively small one. W c concludc that, in general, the random inequality of intervals found in quantitative variables produces no dra­ matic invalidation of M R C but merely makes a minor contribution to the residual (error) variance. Systematic inequality of intervals may arise at any intended level of quantita­ tive scaling, ordinal, interval, or ratio. If the measurement process wherein numbers are assigned to phenomena is such that, relative to hypothetical ‘"true” measures, the intervals at one end of the scale are larger than those at the other, or larger at both ends than in the middle, or more generally, different in different regions of the scale, adjacent interval sizes will be correlated (the serial correla­ tion will be positive). This may arise in many ways in a domain as diverse as behavioral sciences. For example, in ordinal scaling of responses to an attitude item from an interview, an investigator may inadvertently provide more scale points in the vicinity of his own attitude position than in other regions. As another example, difficulty in writing hard items for an intended interval scale of reasoning ability may make higher intervals smaller than lower ones. A familiar example is that the use of percentile values from unimodal, approximately sym­ metrica! distributions makes middle intervals large relative to extreme ones. The effect of such systematic interval inequality in a variable is to alter the shape of its observed relationship with other variables from what it would be if measured in equal units. True linear relationships become curvilinear (or the reverse), bends in curves may increase (or decrease) in number or degree of curvature. But rather than invalidate M R C analysis, it is explicitly to fit functions irrespective of their shape that the polynomial method is used. So whether the shape of the function Y of V is truly reflected in the data, or is due to systematic interval inequality, or both, a polynomial fit to the actual data can be obtained. The scaling requirements for the dependent variable arc at least as modest. Measurement error docs not even theoretically violate the model, and inequality of interval size, random and systematic, operates as for the IVs, producing noise and curvature change, respectively. It should also be noted that dichotomous dependent variables (employed-unemployed, married-single, pass-fail) may be codcd 1-0 and used as dependent variables. With this coding, the (and A and

6.2 P O W E R PO LY N O M IA LS

241

Y) are simply interpreted as proportions, which is very convenient. This practice is in formal violation of the model, which demands that for any given combina­ tion of X, values, the Ys be normally distributed (and of constant variance), a patent impossibility for a variable that takes on only two values. Yet in practice, and with support from the central limit theorem and empirical studies, dichotomous dependent variables arc usefully employed in M R C (Overall, 1980). In summary, then, neither measurement error nor inequality of intervals pre­ cludes the use of polynomial M R C , despite some formal assumption violation of the fixed regression model. In practice, ordinal scales, as well as those that seek (not necessarily successfully) to yield interval or ratio level measurement, may be profitably employed. The Polynomial as a Fitting Function W e have seen in the examples that the use of polynomial M R C correctly indicates the fact of nonlinearity of regression and its general nature. Dut the fit it provides with a few terms need not necessarily be a good one. In the quadratic example (Fig, 6.2.1), the fit is reasonably good (provided that the maximum is not taken too seriously). In the cubic example (Fig. 6.2.2), although the poly­ nomial function yields a maximum and minimum within the range studied, the former is displaced from where the Y mean of the data to be fitted has its maximum by 1 point in w (hardly a negligible amount in a 6-point scale). Thus, although the cubic term significantly improves the fit over that provided by the quadratic polynomial and properly mirrors the fact that the shape of the regres­ sion is two bended, one bend does not come in quite the proper place. The cubic polynomial docs the best it can, but simply cannot manage the steep climb between w = 2 and = 3 neccssary for a really good fit. O f eoursc, a fifth-order polynomial can because, given that there are only six values of u\ it will go through all the means exactly. But, we have argued against the empirical fitting of high-order polynomials as being unparsisnonious, hence unlikely to contribute to scientific understanding. This argument has its maximum force when, for q points, the polynomial is of the order q ~ 1. Our claim for the polynomial of low order is modest: it is a good general fitting function for most behavioral science data when curvilinearity exists, particularly for data of the “ soft” kind. The close fitting of curves with few parameters requires either strong theory that takes the form of an explicit mathematical model (of which examples can be found in such fields as mathematical learning theory, sensory psychology and psychophysics, and economctric theory— see Section 6.5.2) or, for purely empirical fitting, a collection of fitting functions of which the polynomial is only one. Curve fitting is a branch of applied mathemat­ ics in its own right, and any detailed attention to it is beyond our scope (and competence). The interested reader might wish to pursue the possibilities of other general fitting functions: trigonometric, Dessel, and Chebyshev functions in

242

6.

Q UAN TITATIVE S C A L E S

particular. These, like the polynomial, may be used as coding procedures in M RC ; they are other functions of v than positive integer powers. To take a simple example, one such fitting function uses as aspccts of v: (6.2.7)

X j = sin v,

X 2 = sin 2 u,

A", =sin3u,

etc.,

the angle v being expressed in radians. When this function is applied to the example for the cubic polynomial (Section 2.6.3), it results in a poorer fit f o r i = 1, 2, and 3, the successive cumulative R 2s being .201, .210, and .212. It so happens that it is sin 5w that largely provides the fit for these data— its simple r2 with Y is .315, almost as large as R 2 for the first three terms of the polynomial (.356). In closing, we reiterate: the polynomial of low order has the virtues of simplicity, flexibility, and general descriptive accuracy. It will work well in most behavioral science applications, and particularly for those in which we wish to represent a quantitative research factor using two or three terms, rather than to achieve a maximal fit. 6.3 ORTHOGONAL POLYNOMIALS 6.3.1 Method The terms in the polynomials of the last section were hardly uncorrelated. We saw that for typical (positive) scores, without centering, the correlations among the first few powers of v are in the nineties. What made these nonorthogonal IVs work was the use of the hierarchical model. When v2 (= X 2) is entered into an equation already containing v (= X ,), the partialling process makes of it effec­ tively v2*v or (X2 j), that is, v2 from which v has been (linearly) partialled. Now, by the very nature of the system (as was seen in Chapter 2) X 2 , is necessarily orthogonal to (correlated zero with) X,. More generally, at any stage of cumula­ tion, Xj. l2 , _ i is orthogonal to X s, X 2, . . . , and X{ _ ,, and whatever portion of the Y variance it accounts for (/,) is different from (not overlapped with) that accounted for by the latter individually and collectively (/,, /2, . . . , /; ,). The use of k positive integer powers of v is one way of coding V. Orthogonal polynomials constitute a means of coding V into k IVs that not only represent linear, quadratic, cubic, etc. aspects of V, but do so in such a way that these coded values are orthogonal to each other. Recall that this is exactly the same demand that was set for contrast coding of nominal scales (Section 5.5). In fact, it is purely a matter of terminology whether the orthogonal polynomials are eallcd curve (or trend) components or contrasts. The reader is probably familiar with orthogonal polynomials from “ trend analysis” in A V . W e w ill see here, as before, that M R C can produce the same results as A V and, beyond that, can use orthogonal polynomial coding more flexibly and for other purposes (Cohen, 1980). Table 6.3.1 presents orthogonal polynomial coding (a coefficients as in Scc­ tion 5.5) for quantitative scales having from (u =) 3 to 12 scale points at equal

6.3 ORTHOGONAL PO LY N O M IA LS

243

T A B L E 6.3.1 Orthogonal Polynomial Coding for u-Point Scales; First-, Second-, and Third-Order Polynomials for u = 3 to 12 a

X,

u =4

x t

x t

u =6

c II

u =3

X,

x,

x,

x,

-2 -1 0

-1 -2

1 2

-\ 2

1

1

-3

I

-1

0 -1

-2 1

-1 1 3

-1 -1 1

3 -3 1

2

x,

X,

-1

-5

5

2 0 -2 1

-3 -1 1

■-I 4 -A

3 5

-I 5

u =7 x 3

X*

X,

-5 7 4

-3 2 -1

—4 -7 5

0 1 2

C II oo

3

X,

X,

■7 -5

7 1

-3 -1 1

3 -5 -5

3 5 7

-3 i 7

u =9

-3 -A -3 0 5

-1 I 1 0 -1 -1 1

u = 12

u = 11

U=10

5 0

X,

*3

x t

x 2

x 3

X,

x,

-7 5 7

-A

28 -8

-9 -7 -5

6 2 -1

-42 14 35

-5 --4 -3

15 6 -1

-30 6 22

-11 -9 -7

-33

7

-14 7 13

55

3 -2

25 1

3 21

3 -3 -7

-1 0 1

-17 -20 -17

9 0

-3 -1 1

-3 -4 -4

31 12 -12

-2 -1 0

-6 -9 -10

2 3 4

-8 7 28

3 5

-3 -1 2

-31 -35 -14

1 2 3

-9 -6 -1

-5 -3 -1 1

■■■17 -29 -35 -35

25

-S 7

23 14 0 -14 -23 -22

3 S

-29 -17

6

42

4 5

6 15

-6 30

7 9 11

1 25 55

-9 -13 -7 14

7 9

x t

x t

X,

19 7 -7 19 -25 21 -3 33

°T h is table is abridged from Table 20.1 in Owen (1962). Reproduced with the permission of the publishers. (Courtesy of the U .S. Atom ic Energy Commission.)

intervals. X 2, and X-, give, respectively, the linear, quadratic, and cubic coefficients. Although there exist for u points orthogonal polynomials up to the order u — 1, only the first three are given here, because higher orders arc not often useful (see the discussion in Section 6 .2 .4 ).s The coefficients of each polynomial for a given u sum to zero, and their products with the coefficients of any other order for that u also sum to zero, the latter constituting the ortho­ gonality property. For example, consider the quadratic (X2) values for a fivepoint scale: its five values sum to zero; its orthogonality withX-, is demonstrated by ( 2 ) ( - l ) + ( - 0 ( 2 ) + (- 2 )(0 ) + ( - l ) ( - 2 ) + (2 )( I) = (X and with X , by 8Higher-order polynomials and larger numbers of points arc available elsewhere. The moss exten­ sive arc those of Anderson and Houseman (1942), which go up to the fifth order and to u = 104. Pearson and Hanley (1954) give up to the sixth order and it to 52.

244

6.

QUANTITATIVE SCALES

.50, let p' = 1 — p, find Ap from the tabic, and then compute (6.5.23)

A p' = 3.14 - A p .

For example, for the arcsine transformation of .64, find A for .36 (= I - .64) which equals 1.29, then find 3.14 — 1.29 = 1.85. Table 6.5.1 will be sufficient for almost all purposes, but for a more exact statement of the transformation of p = 0 and I . and for a denser argument of p with A to four decimal places, sec Owen (1962, pp. 293-303). The amount of tail stretching effected by a transformation may be indexed by the ratio of the length of scale on the transformation of the p interval from .01 to .11 to that of the interval from .40 to .50. For A, this index is 2 A (compared with 4.0 for the probit and 6.2 for the logit).

The Probit Transformation This transformation is variously called probit, nonnit, or, most descriptively, the normalizing transformation o f proportions, a specific instance of the general normalizing transformation (see Section 6.5.6). W e use the term probit in recog­ nition of its wide use in bioassay, where it is so designated. Its rationale is straightforward. Consider p to be the cumulative proportion of a unit normal curve (that is, a normal curve “ percentile” ), determine its baseline value, zp , which is expressed in sd departures from a mean of zero, and add 5 to assure that the value is positive, that is, (6.5.24)

P = zp + 5.

Table 6 .5 .1 gives P as a function of p for the lower half of the scalc. When/? = Oand 1, P is at minus and plus infinity, respectively, something of an embarrass­ ment for numerical calculation. W e recommend that for p = 0 and 1, they be revised to (6.5.25) and (6.5.26) where v is (as throughout) the denominator of the counted fraction. This is arbitrary, but usually reasonable. If in such circumstances these transformations make a critical difference, prudence suggests that this transformation be avoided. For P values for p > .50, as before, let/?' = I — p, find P from Table 6.5.1, and then find

268

6.

QUANTITATIVE SC A LES

{6 .5 .2 7 )

P p '^ lO -P p .

For example, the P for p = .83 is found by looking up P for . 17 (= 1 - .83) which equals 4.05. and then finding 10 - 4.05 = 5.95. For a denser argument for probits, which may be desirable in the tails, see Fisher and Yates (1963, pp. 68—71), but any good table of the inverse of the normal probability distribution will provide the necessary zp values (Owen. 1962, p. 12).

TABLE 6.5.1 Arcsine 14). Probit (P), and Logit (/-) Transformations for Proportions ip i a p

A

P

L

P

.000 .002 .004 .006 .008 .010 .012 .014 .016 .018 .020 .022 .024 .026 .028 .030 035 .040 .045 .050 .055 .060 .065 .070 .075 .080 .085 .090 .095 .too .1 1 .12 .13 ,!4 .15

.00 .09 .13 .16 .18 .20 .22 .24 .25 .27 .28 .30 .31 .32 .34 .35 .38 .40 .4 3 .45 .47 .49 .52 .54 .S5 .57 .59 .61 .63 .64 .68 .71 .74 .77 .80

_b

_b

2.12 2.35 2.49 2.S9 2.67 2.74 2.80 2.86 2.90 2.95 2.99 3.02 3.06 3.09 3.12 3.19 3.25 3.30 3.36 340 345 3.49 3.52 3.56 3.59 3.63 3.66 3.69 3.72 3.77 3.83 3.87 3 92 3.96

—3.11 -2.76 -2.56 -2.41 -2.30 -2.21 2.13 -2.06 -2.00 -1.96 -1.90 -1.85 -1.81 -1.77 -1.74 -1.66 -1.59 -1.53 -1.47 -1.42 -1.38 -1.33 - 1.29 -1.26 -1.22 -1.19 116 - 1.13 -1.00 -1.05 -1 00 -95 -91 -.87

.16 .17 .18 .19 .20 .21 .22 .23 .24 .25 .26 .27 .28 .29 .30 .31 .32 .33 .34 .35 .36 .37 .38 .39 .40 41 .42 43 44 .45 .46 47 48 .49 .50“'

A .82 .85 .88 .90 .93 .95 .98 1.00 1.02 1.05 1.07 1.09 1.12 1.14 1.16 1.18 1.20 1.22 1.25 1.27 1.29 1.31 1.33 1.35 1.37 1.39 1.41 1.43 1.45 1.47 1.49 1.51 1.53 1.55 1.57

aScc text for values when p > .50. bScc text for transformdtion when p = » 0 or 1.

P

L

4.01 4.05 4.08 4.12 4 16 4.19 4.23 4.26 4.29 4.33 4.36 4 39 4.42 4 45 4.48 4.50 4.53 4.56 4.59 4.61 4.64 4.67 4.69 4.72 4.75 4.77 4.80 482 4.85 4.87 490 492 4.95 4.97 5 00

-.83 -.79 -.76 -.72 .69 -66 -.63 -.60 -.58 -.55 .52 -SO .47 -.45 -.42 - .40 -.38 -.35 -.33 -.31 .29 -.27 -.24 -.22 -.20 -.18 -.16 -.14 -.12 .10 -.08 -.06 -.04 -.02 .00

6.5 N O N LINEAR T R A N SFO R M A T IO N S

269

The probit transformation is intermediate in its degree of tail stretching— as noted in the previous subsection, its index is 4.0 compared with 2.4 for the arcsine and 6.2 for the iogit. The probit transformation is intuitively appealing whenever we conceive that some construct, if measurable dircctly, would be normally distributed, but our available measure is instead a cumulative proportion (or frequency). Thus, we go from the ordinate of such an assumcdly normal cumulative distribution to its baseline measure. Such a transformation w ill often seem plausible in some areas of experimental and physiological psychology, psychometrics, and epidem­ iology .

The Logit Transformation This transformation is related to the logistic curve, which is similar in shape to the normal curve but generally more mathematically tractable. The logit transfor­ mation is (6.5.28)

L = Iln , 1~p

2

where “ In ” is the natural logarithm (base e). As with probits, the logits for/> = 0 and 1 are at minus and plus infinity, and the same device for coping with this problem [Eqs. (6.5.25) and (6.5.26)] is recommended: replace p = 0 by p = l/(2v) and p — 1 by (2v — l)/(2v) and find the logits of the revised values. As before, Table 6.5.1 gives the L fo r p up to .50; fo r p > .50, let p ' = I — p, find

L and change its sign to positive for L (6.5.29) F o rp ~

that is,

Lp- = - L p . .98, for example, find L for ,0 2 {= I — .98), which equals — 1.96, and

change its sign, thus L for .98 is + 1.96. The logit stretches the tails of the p distribution the most o f the three transfor­ mations. The tail-stretching index (described previously) for the logit is 6.2, compared with 4.0 for the probit and 2.4 for the arcsine. The quantity p /(l — p) is the odds related to p (e.g., when p = .75, the odds are .75/.25, or 3:1, or simply 3). The logit, then, is simply half the natural logarithm o f the odds. Therefore logits have the property that for equal intervals on the logit scale, the odds are changed by a constant multiple; for example an increase o f .35 on the logit scale represents a doubling o f the odds, because .35 is £ In 2. W e also note the close relationship between the logit transformation of p and Fisher’s z' transformation of the product moment r (see Section 2.8.3 and A p ­ pendix Table B ). If we let r = 2p — 1, then the 2 ' transformation o f r is the logit of p . W e can take advantage of this fact if we wish a denser argument in the tails than is provided in Table 6.5.1, because r to z' transformation tables are more readily available than p to L tables. A useful logit transformation table is given by Fisher and Yates (1963, p. 78).

270

6.

QUANTITATIVE SC A L ES

Note that ail three transformations are given in the form most frequently used or most conveniently tabled. They may be further transformed linearly if it is found convenient by the user to do so. Fo r example, if the use of negative values is awkward, one can add a constant to L o f 5, as is done for the same purpose in probits. Neither the 2 in the aresine transformation [Hq. (6.5.22) nor the

2

in the

logit transformation [Eq. (6.5.28)j is ncccssary for purposes o f correlation, but they do no harm and are tabled with these constants as part of them in accordance with their conventional definitions. 6.5.6 N o rm alization of Scores and Ranks Normalization is more w idely applicable as a nonlinear monotonic transforma­ tion than w'as described previously in connection w'ith proportions. Whenever it seems reasonable to suppose that a construct being measured by a variable v is represented with greater fidelity by rescaling it so that it yields a frequency distribution o f normal form, this transformation may be applied. More specifical­ ly, as with other transformations, the usal goal is linearization o f relationships wilh other variables. A frequent candidate for normalization is ranked data. When a third-grade teacher characterizes the aggressiveness of her 30 pupils by ranking them from 1 (most) to 30 (least) or vice versa, the resulting 30 values may occasion difficul­ ties when they are treated numerically as measures. Ranks are necessarily rec­ tangularly distributed, that is, there is one score of 1, one score o f 2, . . . . one score of 30. If, as is likely, the difference in aggressiveness between the most and next-most (or the least and next-least) aggressive child is greater than be­ tween two adjaccnt children in the middle (e.g., those ranked 14 and 15), then the scale provided by the ranks is not likely to produce linear relationships with other variables. This need to stretch the tails is the same phenomenon encoun­ tered with proportions; it presupposes that the distribution of the construct to be represented has tails, that is, is bell shaped or normal. Because individual dif­ ferences for many well-measured biological and behavioral phenomena seem to approximate this distribution, in the face o f ranked data it is a rcasonahle trans­ formation to apply in the absencc o f specific notions to the contrary. Even if the normalized scale is not optimal, it is likely to be superior to the original ranks. The method for accomplishing this is simple. Follow'ing the procedure de­ scribed in elementary statistics textbooks for finding centiles (percentiles), ex­ press the ranks as cumulative proportions, and refer these either to a unit normal curve table (Appendix Table C ) to read off z , or use the P column o f Table 6.5.1, where 5 has been added to zp to yield probits.17 Having come this far, it is apparent that the original scale need not be made up o f ranks at all but may be any scale whose unit of measurement is believed not to be equal or even changing in size regularly from one end to the other. The basis 17An alternative, slightly superior model for the normal transformation is tabled hv Owen (1962). as ' 'expected values of order statistics from the normal distribution (pp. 15 i - 154),” and goes up ton ~ 50. The difference between the models would be very slight unless t>is quite small, say less than 10 .

6.6 S C A LE L E V E L AND M ET H O D S O F R E P R E S E N T A T IO N

271

for this suspicion may be a grossly inegular-shaped frequency distribution, mul­ timodal or strangely skewed. Such distributions may be rescaled into normality by dint of forte: write the cumulative frequency distribution on the original scale, grouping into a dozen or so intervals if and as needed, and then go from the cumulative p values to their normal curve deviates, z (or P ), as previously discussed. The result is a monotonic transformation to a normal distribution, however irregular the original scalc. Normalization should be used judiciously, but whenever an original scaling yields a wild distribution, and particularly where there is some basis for belief that the construct being assessed is usefully conceived as being more or less normally distributed in the population being sampled, it is worth the effort to normalize. 6.5.7 The Fisher z ' Transform ation of r It is sometimes the case that a variable is measured and expressed in terms of r. The most common instance of this generally rare circumstance is in “ Q sorting” (Stephenson, 1953), where items are sorted into rating categories of prescribed size (usually defining a quasi-normal distribution) so as to describe a complex phenomenon such as personality. The similarity of two such 2-sort descriptions, for example, actual self and ideal self, is then indexed by the r between ratings over the set of items. Such a scaling is more likely to relate linearly to other variables if the r ’s are transformed by the Fisher z' transformation (as described in Sections 2.8 and 2.10) and are more likely to satisfy the normality and equal variance assumptions when used as a dependent variable. The z' transformation of r is given in Appendix Table B.

6.6 LEVEL OF Q UAN TITATIVE SCALE AND ALTERNATIVE METHODS OF REPRESENTATION The alternative methods o f representing quantitative scales described in this chapter do not relate closely to the ordinal-interval-ratio distinction of Stevens (1951) but rather to the analytic goal and content of the research area. A given level of scaling may be represented by all, or almost all, the methods of represen­ tation described previously, although some combinations of level and method will occur more frequently than others. If one’s purpose is primarily descriptive, or one wishes simply to assure that a construct is being represented in the IVs whatever the shape of the relationship, then simple fitting functions with multiple IV s such as polynomials (powers or orthogonal) or nominalization are appropri­ ate, and the level of scaling is irrelevant. If one's approach is primarily analytic in that features of the curve have theoretical roots and implications, although simple fitting functions may be put to such use, the chances are that some mathematically defined nonlinear transformation into a single new variable (log v, or V v ) w ill be mandated, or at least suggested, by the theory. In this circum­ stance, it is likely that v is measured on a ratio scale. Normalization, which may

272

6.

QUANTITATIVE SCALES

be employed either for linearizing, for theoretical reasons, or both, can also be applied at any level of scaling. 6.6.1 Ratio Scales When, as in ratio scales, there is a true zero point as well as equality of intervals (and almost always only positive values are defined), any method of representa­ tion is admissible, but some are likely to be more attractive than others. There is something about this highest form of measurement that invites elegance, for example, regular nonlinear transformations (logs, reciprocals) rather than the gross pulling and hauling of normalization or the crude representation of nominal ization. Still, what finally governs is the purpose of the analysis. If a sociologist wants to represent number of children among other indicators of socioeconomic status, his goals may be better served by polynomials or nominalization than by a log or square root transformation, despite the fact that number of children is unquestionably a ratio scale. An experimental psychologist working with ratio scales in learning, perception, or cognition might well make the other choice. Proportions as measures are clearly ratio in nature, and the special tail-stretch ing transformations deserve first consideration, unless a specific mathematical model, which dictates some other nonlinear transformation, is involved. For example, in some mathematical models in learning theory the logarithm of p is used. This transformation stretches the lower tail but contracts the upper one. 6.6.2 Interval Scales Scales whose units are more-or-less equal (see Section 6.2.4), but which have an arbitrary zero point, arc the mainstay of the soft behavioral scicnces, where M R C is likely to be a particularly useful data-analytic method. The equality of the units, of eoursc, docs not assure that interval scales will be linearly related to other quantitative scales. Again, depending on the purpose of the analysis, for independent variables the descriptive representational polynomial and nominal­ ization procedures arc available, as well as normalization. When the variable in question is a dependent variable, the multiple coding of polynomials and nomi­ nalization is not available, but normalization is and may accomplish the linear­ ization that is sought. The one-to-one nonlinear transformations (Scction 6.5) are not generally attractive because of the arbitrariness of the zero in interval scales. The core idea of this level of scaling is that if an interval scale u is arbitrarily linearly transformed to u’ = cu + d (where c and d are any constants), u' contains exactly the same information for correlation as u does (i.e., ruu. = I), and it is a matter of indiffererenec whether one uses u or one of the infinite number of alternatives provided by u ’. However, if one H