The Art of Empirical Investigation [1 ed.] 0765805308, 9780765805300

Julian Simon was known for his methodical, and often controversial, writings challenging conventional beliefs about over

387 76 120MB

English Pages 584 [585] Year 2003

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Art of Empirical Investigation [1 ed.]
 0765805308, 9780765805300

Citation preview

The Art of

EMPIRICAL INVESTIGATION

The Art of

EMPIRICAL INVESTIGATION Julian L . Simon With a new introduction by

James E . Katz

O

Routledge Taylor & Francis Group

L O N D O N AND NEW YORK

Originally published i n 1969 by Random House Published 2003 by Transaction Publishers Published 2017 by Routledge 2 Park Square, M i l t o n Park, Abingdon, O x o n O X 1 4 4 R N 711 T h i r d Avenue, N e w Y o r k , N Y 10017, U S A

Routledge is an imprint of the Taylor & Francis Group, an informa business N e w material this edition copyright © 2003 by Taylor & Francis. A l l rights reserved. N o part o f this book may be reprinted or reproduced or utilised i n any f o r m or by any electronic, mechanical, or other means, now k n o w n or hereafter invented, including photocopying and recording, or i n any information storage or retrieval system, w i t h o u t permission i n w r i t i n g f r o m the publishers. Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation w i t h o u t intent to infringe. Library o f Congress Catalog Number: 2003040991 Library o f Congress Cataloging-in-Publication Data S i m o n , Julian L i n c o l n , 1932[Basic research methods i n social science] The art o f empirical investigation / Julian L . Simon ; w i t h a new introduction by James E. Katz. p. cm. Originally published: Basic research methods i n social science. 2nd ed. New York : Random House, 1978. W i t h a new introd. Includes bibliographical references and index. I S B N 0-7658-0530-8 (paper : alk. paper) 1. S o c i a l sciences—Research. 2. S o c i a l s c i e n c e s — M e t h o d o l o g y . I . Title. H62.S475 2003 300'.7'2—dc21 I S B N 13: 978-0-7658-0530-0 (pbk)

2003040991

I am grateful for the privilege of quoting from "voices to voices" in Complete Poems 1913-1962 by E. E. Cummings. Reprinted by permission of Harcourt Brace Jovanovich, Inc.; from Argonauts of the Western Pacific by Branislaw Malinowski. Published in the United States by E. P. Dutton, and reprinted with their permission and with permission of Routledge & Kegan Paul Ltd.; from "Freud in Research Is a Rising Problem in Science" by Boyce Rensberger. © 1977 by The New York Times Company. Reprinted by permission; from a by-lined article by Ward Cannei, copyrighted January 26, 1964 by Newspaper Enterprise Association. Reprinted by permission; from The Nature of Statistics by Allen Wallis and Harry Roberts. Re­ printed by permission of The Macmiilan Company. © The Free Press of Glencoe, a Corporation 1956; from The Essential Works of Pavlov by Ivan Pavlov (edited by Michael Kaplan). Copyright © 1927 by Oxford University Press. Copyright © 1966 by Bantam Books. Reprinted by permission of the Oxford University Press; from Memory: A Contribution to Experimental Psychology by Hermann Ebbinghaus. Copyright © 1913 by Dover Publications, Inc.; from "Obesity's Problem Child Demands Rewards" by Arthur J. Snider. Copyright © 1964 by The Chicago Daily News. Reprinted by permission; from "Phobias in Britons Fall into 130 types." Copyright © 1969 by The New York Times Company. Reprinted by permission; from Readings in the Philosophy of the Social Sciences by May Brodbeck. Copy­ right © 1968 by The Macmiilan Publishing Company, Inc. Reprinted by permission; excerpt from the Champaign-Urbana Courier February 21, 1965. Reprinted with permission; from The Ambassador by Morris West. Copyright © 1965 by William Morrow & Company, Inc. Reprinted by permission; from The Human Meaning of the Social Sciences by Daniel Lerner. Copyright © 1959 by Peter Smith. Reprinted by permission; from The Conduct of Inquiry by Abraham Kaplan. Copyright © 1974 by Thomas Y. Crowell, Inc. Reprinted by permission; from Paul F. Lazarsfeld, The People's Choice: How the Voter Makes up His Mind in a Presidential Campaign, third edition, New York: Columbia University Press, 1968. Reprinted by permission; from Printer's Ink, August 27, 1965; from Anthropology by A. L. Kroeber. Copyright 1923, 1948 by Harcourt Brace Jovanovich, Inc.; renewed 1951 by A. L. Kroeber. Reprinted by permission of the publishers; from The Economics of Seller's Competi­ tion by Fritz Machlup. Copyright © 1952 by The Johns Hopkins University Press. Reprinted by permission; from "Evaluation of Alternate Rating Devices for Consumer Research," from Journal of Marketing Research, published by the American Market­ ing Association; from An Introduction to Scientific Research by E. Bright Wilson. Copyright © 1952 by McGraw-Hill, Inc. Used with permission of McGraw-Hill Book Company; from Lawrence R. Klein (ed): Contributions of Survey Methods to Economics, New York: Columbia University Press, 1954. Reprinted by permission; from "Land Price Set by Jury." Copyright © 1965 by the Champaign-Urbana Courier. Excerpt from Illinois Business Review, April 1966, p. 9. Copyright © 1966. Reprinted by permission; from "Hired Hand Research," by Julius A. Roth. Copyright © 1966 by the American Sociological Association. Reprinted by permission of publisher and author; from Experimenter Effects in Behavioral Research by Robert Rosenthal. Copyright © 1966. Reprinted by permission; Selections from Stanley L. Payne, The Art of Asking Questions (copyright 1951 by Princeton University Press), pp. 7, 11 and 16. Reprinted by permission of Princeton University Press; from " A Middle Way Out of Vietnam" by Arthur Schlesinger, Jr. in The New York Times Magazine. © 1966 by The New York Times Company. Reprinted by permis­ sion; from "A Psuedo Experiment in Parapsychology" by L. W. Alvarez. Copyright 1965 by the American Association for the Advancement of Science. Reprinted by permission of the author; from "Privacy and Behavioral Research" by K. E. Clark et al. Copyright 1967 by the American Association for the Advancement of Science. Reprinted by permission of the authors.

contents

Introduction to the Transaction Edition xiii Preface to Second Edition xix Acknowledgments to Second Edition xxii Acknowledgments to First Edition xxiii Prologue xxiv 1. 1. 2. 3. 4.

Introduction 3 Purpose of the book 3 What kinds of research are called "empirical"? 5 The place of statistics in the study of research methods Some general remarks 10

PART ONE: The Process of Sociaf-Science Research 2. 1. 2. 3. 4. 5. 6. 7. 3. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 4. 1. 2. 3. 4. 5. 6.

The Language of Research: Definitions and Validity Operational definitions 12 Numbers and operational definitions 17 Validity and reliability 21 Writing scientific reports 23 Communicating with yourself 25 Honesty in scientific reporting 25 Summary 28

12

Basic Concepts of Research 30 Variables in general 30 The dependent variable 31 The independent variable 31 Parameters 31 The functional form 33 Assumption, theory, deduction, hypothesis, fact, and law Universe and sample 36 The ideal causal-study design 36 Ceteris paribus 39 Summary 47 Types of Empirical Research 43 Introduction 43 Case-study descriptive research 44 Classification research 46 Measurement and estimation 49 Comparison problems 50 Research that tries to find relationships

52

viii

Contents 7. 8. 9. 10.

Finding causes and effects Mapping structures 57 Evaluation research 58 Summary 59

56

5. 1. 2. 3. 4. 5.

Theory, Model, Hypothesis, and Empirical Research 62 What is theory? 63 Models, theory, and hypotheses 64 Two views of theory and of science in general 65 The relationship between theory and empirical research 65 Summary 68

6. 1. 2. 3. 4. 5. 6.

Choosing Appropriate Proxies for Theoretical Variables 71 Dependent variables whose referents are clearly defined 73 Dependent variables whose referents are not clearly defined 80 Choosing independent variables 84 Choosing a level of aggregation 85 Choosing a level of explanation 89 Summary 92

PART TWO: Research Decisions and Procedures 7. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

The Steps In an Empirical Research Study 96 Finding a good research problem 97 Ask "What do I want to find o u t ? " 98 Establish the purpose of the project 100 Determine the value of the research 100 Saturate yourself in the problem 100 Choose empirical variables 101 Calculate the benefits of accuracy and the costs of error Determine the most important research obstacles 105 Choose methods 105 Check the ethics of your proposed research and method Prepare a detailed design of the method 109 Collect the data 111 Analyze the data 112 Write up the research work 112 Summary 112

104

109

8. 1. 2. 3.

How to Assess the Potential Value of Research Projects 115 Evaluating the inputs required for a research project 116 Estimating the value of research projects 118 Summary 125

9. 1. 2. 3. 4. 5. 6.

Sampling 126 Universe and sample 126 Random sampling basics 127 Screening in sampling 132 Efficiency in random sampling 132 Nonrandom samples 138 Summary 143

10. Experiments: Pro, Con, and How to Do Them 145 1. The distinction between experiment and survey 145 2. The steps in experimenting 147

Contents 3. Advantages of experiments 153 4. Disadvantages of experiments 156 5. Summary 158 11. 1. 2. 3. 4. 5. 6. 7. 8.

Designing Experiments 160 Simple uncontrolled exploratory experiments Before-and-after experiments 162 Matched-groups design 163 Simple randomized-groups design 164 Some additional one-variable designs 166 Multivariate designs 167 Experimental designs to study delayed effects Summary 171

160

171

12. 1. 2. 3.

Non-Experimental Designs for Studying Relationships 174 Time series—the long view 175 The cross section—the wide view 176 Causes of differences in results from time-series and crosssectional studies 182 4. Designs for studying changes over time 183 5. The panel 185 6. Summary 187

13. 1. 2. 3. 4. 5. 6.

Surveys: Pro, Con, and How to Do Them 190 The nature of surveys 190 Advantages of the survey method for relationship research 191 Disadvantages of the survey method for relationship research 192 Descriptive surveys 193 The steps in executing a survey 200 Summary 201

14. 1. 2. 3. 4. 5. 6. 7.

Some Other Qualitative and Quantitative Techniques Deductive reasoning 203 The case study 206 Participant observation 207 Expert opinion 208 Content analysis 211 Simulation 215 Summary 217

15. 1. 2. 3. 4.

Classifying, Measuring, and Scaling Classifying 220 Measuring and scaling 227 Strengths of scales 230 Summary 233

16. 1. 2. 3. 4. 5.

Scaling Human Responses 235 Types of mental activity to be measured Stimulus and response scales 238 Simple composite scales 239 Choice of scales 240 Summary 243

219

236

203

ix

Contents 17. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Data Handling, Adjusting, and Summarizing 244 Data collection and how to avoid disaster 244 Errors in automatic data processing 245 Adjusting the data 246 Estimating missing data 251 Imputation 252 Imputing value in the absence of standards 254 Standardizing the data 255 Index numbers 258 Avoiding the hazards of hired help 261 Summary 263

PART T H R E E : The Obstacles to Social-Science Knowledge and Ways to Overcome Them 18. The Concept of Obstacles In the Search for Empirical Knowledge 266 1. Summary 271 19. Obstacles Created by the Humanness of the Observer: Appendix on Interviewing 273 1. Observer variability 273 2. Observer bias 275 3. Cheating by interviewers 279 4. Variability among observers 281 5. Observer-caused effects 281 6. Summary 290 7. Appendix: Personal interviewing and interviews 291 20. Complexities and Intractability of the Human Mind: Appendix on Questionnaire Construction 294 1. Lack of knowledge by the subject 294 2. The fallibility of memory 296 3. Cover-up 297 4. Trying to please the observer 298 5. Rationalization and repression 299 6. Deception 299 7. A brief note on behaviorism 301 8. Summary 302 9. Appendix: Questionnaire construction 302 21. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Obstacles to Obtaining Adequate Subject Matter 309 Bias in the sample 309 Nonresponse: Unavailability of some part of the universe Inability to experiment with the subject matter 320 Flaws in subject matter and unreliability of data 322 Shortage of subject matter 328 High individual variability in the data 330 Too much subject matter 332 Invisibility or inaccessibility of subject matter 334 Interference with subject matter by researcher 334 Summary 334

316

Contents 22. Obstacles to the Study of Changes Over Time 1. Summary 340 23. 1. 2. 3. 4.

336

Obstacles to the Search for Causal Relationships 341 Causation by a "hidden third factor" 342 Multivariable causation 344 Confounding the dependent and independent variables Feedback; interaction between dependent and independent variables 351 5. Summary 352

24. 1. 2. 3. 4. 5. 6. 7.

348

The Master Obstacle: Cost 354 Procedure to produce data for study 355 Sampling and cost 356 Sample size 356 How many variables to investigate 357 Amount of tolerable bias 358 Managerial cost efficiency in research 358 Summary 360

PART FOUR: Extracting the Meaning of Data 25. 1. 2. 3. 4.

Analysis of Simple Data: Association and Regression 362 Testing for the existence of simple relationships 363 Characterizing the form of an association 366 Regression analysis: characterizing a causal relationship 369 Evaluating the strength and importance of observed differences and relationships 373 5. Summary 376

26. 1. 2. 3. 4. 5. 6.

Searching for Relationships: Analysis of Complex Data 378 Searching for new relationships within the data 378 Reading the meanings of data 382 Refining observed associations into causal relationships 385 Generalizing and predicting 390 The tension between facts and theory or hypothesis 394 Summary 395

27. 1. 2. 3. 4.

Inferential Statistics: Introduction 397 Types of statistics 397 The nature and meaning of probability 398 The concept of independent events 402 The distinction between "probability theory" and "inferential statistics" 403 5. Translating scientific questions into probabilistic and statistical questions 404 6. Summary 409

28. Probability and Hypothesis Testing by the Monte Carlo Method 410 1. Introductory problems 410 2. The general procedure 416 3. Hypothesis testing 418 4. Summary 429

xi

Contents 29. Hypothesis Testing with Measured Data

430

30. 1. 2. 3. 4. 5.

Correlation and Other Statistical Issues 437 Correlation and Association 437 Advantages and disadvantages of the Monte Carlo method The multiplication rule 449 Evaluating the significance of " d r e d g e d - u p " relationships Summary 456

31. 1. 2. 3.

How Big a Sample? 459 Samples in descriptive research 461 Step-wise sample-size determination 469 Summary 470

32. The Concept of Causality in Social Science, with Notes on Prediction, Law, Explanation, and Function 472 1. Causality 472 2. The problem of definition 477 3. Generalization about the meaning of "cause and effect" 4. Postscript: time, antecedence, and causality 494 5. Summary 497

448 450

485

PART FIVE: Epilogue 33. 1. 2. 3. 4. 5. 6.

The Nature, Powers, and Limits of Social Science 500 What makes an investigation scientific? 500 What can empirical research do? 502 The necessity of knowing the purpose of the research 505 The special powers and limitations of social-science research The ethics of research 510 The present and future of social-science research 512

Bibliography Index 539

514

507

introduction to the transaction edition

I f i r s t e n c o u n t e r e d J u l i a n Simon's book on r e s e a r c h m e t h o d s i n 1969, s h o r t l y after i t was p u b l i s h e d . T h e c o u n t r y a n d c a m p u s w e r e i n t h e m i d s t of a n i n t e n s e w a r i n S o u t h e a s t A s i a a n d on t h e s t r e e t s o f A m e r i c a . I t was a w a r n o t o n l y for t h e h e a r t s a n d m i n d s of d i s t a n t peoples i n a t w i l i g h t s t r u g g l e , b u t for t h e h e a r t a n d soul of A m e r i c a . N o w , m o r e t h a n t h i r t y years l a t e r , t h e U n i t e d States is i n v o l v e d i n a n o t h e r d r a m a t i c s t r u g g l e , b o t h i n t h e t w i l i g h t regions of t h e w o r l d a n d i n t h e f u l l d a y l i g h t here at home. Social science r e s e a r c h was p a r t of t h e domestic a n d i n t e r n a t i o n a l w a r effort t h e n . T o d a y i t is even m o r e i n v o l v e d i n b o t h f r o n t s , a l b e i t i n some d r a m a t i c a l l y d i f f e r e n t w a y s . T h e n , I was a c a l l o w u n d e r g r a d u a t e ; sociology w a s m y g a m e a n d soci­ e t a l i m p r o v e m e n t w a s m y a i m . I was a t t e n d i n g a l a r g e M i d w e s t e r n u n i ­ v e r s i t y t h a t could p l a u s i b l y have served as t h e m o d e l for t h e one i n S i n c l a i r L e w i s ' Arrowsmith. I n those years, i t seemed t h a t t h e d i s c i p l i n e of soci­ ology offered a good w a y to t a k e science a n d a p p l y i t to i m p r o v i n g t h e s t a t e o f t h e w o r l d . O r , to d r a w u p o n p h r a s i n g t h a t m i g h t r e f l e c t m y s t u d y of A r i s t o t l e , I s o u g h t to t a k e science, t h e p a s s i o n o f m y head, a n d h a r n e s s i t to service, t h e passion of m y h a n d . I n t h a t era, d u r i n g t h e l a s t f a d i n g echoes o f a n ethos concerned about s t a n d a r d s , a l l sociology majors a t m y u n i v e r s i t y w e r e r e q u i r e d to t a k e s t a t i s t i c s , t w o y e a r s o f f o r e i g n l a n g u a g e , a year of E n g l i s h c o m p o s i t i o n , advanced m a t h , p u b l i c speak­ i n g , civics, a n d p h y s i c a l e d u c a t i o n . A n d r e s e a r c h m e t h o d s . U n l i k e some of m y pals, I was e n t h u s i a s t i c a b o u t t h e prospect o f t a k ­ i n g t h i s course b e a r i n g w h a t t h e y i n s i s t e d was a soporific t i t l e . M y rea­ s o n i n g was t h a t I h a d been u n h a p p y w i t h t h e s q u i s h y n a t u r e of w h a t i n t h e sophomore l e v e l courses h a d p a r a d e d as k n o w l e d g e . I i m a g i n e d t h a t w i t h t h e m o d e r n tools offered b y q u a s i - e x p e r i m e n t a l designs, operations

xiv

Introduction

to the Transaction

Edition

m a n a g e m e n t , a n d c o m p u t e r technology, t h e s i t u a t i o n c o u l d be r e c t i f i e d . A l t h o u g h n a i v e i n r e t r o s p e c t , m y t h i n k i n g at t h e t i m e w a s t h a t as, w i t h i m p r o v e d i n s t r u m e n t a t i o n a n d c a r e f u l o b s e r v a t i o n , a s t r o n o m y h a d su­ perceded ( b u t s a d l y n e v e r displaced) a s t r o l o g y , so too some new disci­ p l i n e (socionomy?) w o u l d , by e x p l o i t i n g b e t t e r research m e t h o d s , supercede sociology. So I w a n t e d to get s t a r t e d on u s i n g t h e tools of science—I w a s sure t h e y e x i s t e d on some s o c i a l l y i s o l a t e d P l a t o n i c l e v e l — t o discover t h e e q u a l l y r e a l a n d u n v a r y i n g l a w s b y w h i c h social b e h a v i o r a n d o r g a n i z a ­ t i o n a l s t r u c t u r e s are g o v e r n e d . B u t t h e n came t h e r u b : h o w could one uncover possible c a n d i d a t e s , a n d t h e n select t h e g e n u i n e a r t i c l e ? H o w could t h e y be a p p l i e d so t h a t t h e y could p e r f o r m a t t h e i r best? C e r t a i n l y those w e r e d i f f i c u l t q u e s t i o n s , b u t e q u a l l y c e r t a i n w a s t h a t a research methods course w o u l d be t h e place to s t a r t . A n d to t a k e t h e course one w o u l d need to get t h e assigned t e x t b o o k . T h u s i t was t h a t a few weeks before t h e n e w semester began, I f o u n d m y s e l f h e a d i n g to t h e u n i v e r s i t y bookstore to b u y t h e r e q u i r e d t e x t . To get to t h e bookstore one h a d to f i r s t wade t h r o u g h t h e P o w - W o w Room, a garish, noisy, and i n s e n s i t i v e l y named cafeteria t h a t was the hub of campus social a n d i n t e l l e c t u a l life f r o m 11:30 to 2:00. B y p a s s i n g i n t o t h e campus bookstore t h r o u g h i t s p a i r e d Safe-T-Glass doors, a h e r m e t i c seal was made, s t i f l i n g t h e b l a r e of t h e S t r a w b e r r y A l a r m Clock or J i m i H e n d r i x . S i m p l y b y d r o p p i n g one's bag a t t h e s e c u r i t y desk a n d p u s h i n g t h r o u g h a c l a n k i n g t u r n s t i l e , one w o u l d pass f r o m a w o r l d of free speech sit-ins and agi-prop theater happenings into an i n t i m a t e l y planned, qui­ escent c o m m e r c i a l e n v i r o n m e n t . L o w - f i d e l i t y c e i l i n g speakers f i l l e d t h e a i r w i t h t h e H o l l y w o o d S t r i n g s ' r e n d i t i o n o f t h e B e a t l e s ' " Y o u say y o u w a n t a r e v o l u t i o n . " S t u d e n t s t u r n e d stock boys for a semester h u m m e d a l o n g as t h e y s t u f f e d r a c k s w i t h cards s h o w i n g W a l t e r Keane's g i a n t orbed w a i f s s h y l y k i s s i n g each o t h e r . S t r o l l i n g to t h e t e x t b o o k section, I passed b e n e a t h s a g g i n g Peter M a x Love posters a n d m a w k i s h E r i c h Segal Love Story posters. I t h e n t h r e a d e d b e t w e e n B a b e l t o w e r s o f compact c a r d b o a r d s a m p l e b o x e s — b l u s h - p i n k p a i s l e y boxes for g i r l s , l i g h t - b l u e p a i s l e y boxes for b o y s — f i l l e d w i t h sun­ d r y freebie p e r s o n a l care p r o d u c t s t h a t d e c l a r e d t h e m s e l v e s to be "revo­ l u t i o n a r y . " These, I confess, h a d t h e i r i n t e n d e d effect, e x e r c i s i n g a powerful l u r e . T h o u g h f e e l i n g a b i t l i k e Ulysses t i e d to t h e o l d m a s t , I nonetheless pressed on to m y goal: a t l a s t t h e t e x t b o o k section hove i n t o v i e w . I n c o n t r a s t to t h e i n t e n s e l y c o l o r f u l c o m m e r c i a l e x t r a v a g a n z a , t h e t e x t b o o k zone w a s a t h i c k e t of A r m y s u r p l u s m e t a l u t i l i t y shelves. One h a d to step c a u t i o u s l y for c o l u m n s of t e x t b o o k s h a d t u m b l e d o u t i n t o t h e n a r r o w aisles. D e s p i t e a few b l i n d a l l e y s , a n d m a n y pauses to consider w h a t s t u d e n t s i n o t h e r courses w o u l d be r e a d i n g , i t w a s b u t a s h o r t t i m e u n t i l I l o c a t e d w h e r e t h e r e q u i r e d t e x t b o o k for t h e r e s e a r c h m e t h o d s course h a d been l e f t . U n l i k e I n t r o Psych, I n t r o Soc a n d I n t r o P o l i Sci t e x t s , t h i s research m e t h o d s book was n e i t h e r c o l o r f u l i n d e s i g n nor revo-

Introduction

to the Transaction

Edition

xv

l u t i o n a r y i n i t s g r a p h i c s . I t w a s also u n l i k e i t s slender c o m p a t r i o t s t h a t addressed research methods o f o t h e r social science d i s c i p l i n e s . Ecce lihris! T h i s v o l u m e h a d a g i r t h t h a t m a d e i t seem l i k e a m i l l s t o n e . I p u l l e d a copy f r o m t h e s h e l f a n d began l e a f i n g . S u r p a s s i n g 500 pages, t h e book's i m p r e s s i o n of i n t e l l e c t u a l a n d p h y s i c a l w e i g h t i n e s s w a s r e i n ­ forced b y i t s s m a l l font. W o r s e , t h e font h a d a n exaggerated s e r i f t h a t suggested i t h a d been set i n i t a l i c s . N o photos w e r e p r o v i d e d for v i s u a l i n t e r e s t n o r w e r e t h e r e a n y colophons to a t t r a c t t h e eye. S t i l l w o r s e : i t c l a i m e d i t was only a basic i n t r o d u c t i o n to t h e subject! P e r h a p s , I t h o u g h t to myself, t h e s a t u r n i n e j u d g m e n t s of m y f r i e n d s h a d been correct. B u t t h e n I s t a r t e d to r e a d t h e t e x t , j u m p i n g h e r e a n d t h e r e , a n d m y s p i r i t s rose. I t seemed to be w r i t t e n b y someone w h o cared a b o u t m e , n o t i n a condescending w a y , b u t i n a w a y t h a t w o u l d a l l o w me to h e l p m y s e l f r e a c h m y goals. T h e examples d r e w u p o n every aspect of, w e l l , life a n d one's concerns i n t h e p u b l i c a n d p r i v a t e spheres. B e g i n n i n g a t t h a t mo­ m e n t u n d e r those fluorescent l i g h t s , n e x t to those g r a y s u r p l u s shelves, I began to develop a b o n d o f a f f e c t i o n for t h e a u t h o r a n d , m o r e i m p o r ­ t a n t l y , for w h a t seemed to be h i s m i s s i o n . I n t h i s book, as w o u l d be t h e case w i t h m a n y o f h i s l a t e r a r t i c l e s a n d i n t e r v i e w s , J u l i a n S i m o n achieves t h e voice o f a d r o l l , w o r l d - w i s e u n c l e . U s i n g s t r a i g h t f o r w a r d t e r m s a n d c h a r m i n g h o m i l i e s , he p r e s e n t s h i s and other top-flight researchers' insights i n a benign, conversational m a n n e r . One is r e m i n d e d o f t h e i m a g e o f a c o u n t r y doctor, t h o u g h i n t h i s case i t is a c o u n t r y doctor o f p h i l o s o p h y . Professor S i m o n covers t h e v a s t topic of h o w to conduct r e s e a r c h n o t o n l y c l e a r l y b u t also reason­ a b l y . H e is o u t w i t h t h e p e d a n t i c a n d i n w i t h t h e effective. H e d i s p l a y s c h a r m , w a r m t h , a n d s e l f - d e p r e c a t i n g w i t as he p u r s u e s h i s r e a s o n e d d i a l o g u e , ever i n f o r m e d b y t h e p r a g m a t i c i n p u r s u i t o f t h e i d e a l b u t ever m i n d f u l o f t h e i n h e r e n t s h o r t c o m i n g s necessitated b y h u m a n l i m i t a t i o n s . Professor S i m o n has d i s t i l l e d a n i m m e n s e a m o u n t of k n o w l e d g e a b o u t t h e r e a l i t i e s o f t h e "how-to's" of r e s e a r c h w i t h o u t r e v e r t i n g to cookbook­ l i k e l i s t s ( b o t h t h e bane a n d b l e s s i n g of c o n t e m p o r a r y t e x t s ) . These i n ­ s i g h t s have been g a i n e d , i t becomes clear, t h r o u g h h a n d s - o n experiences ( a n d , also e q u a l l y p l a i n , some o f t h e m r a t h e r g r i m ) . Gears are s h i f t e d r a p i d l y a t t i m e s , i n t h e i n t e r e s t of m a k i n g p o i n t s c l e a r l y . Hence, one w i l l find s n i p p e t s of u x o r i a l c h i t c h a t p l a c e d n e a r questions o f macro-economic m o d e l i n g . U l t i m a t e l y Professor S i m o n , i t is clear, c a r e d p a s s i o n a t e l y a b o u t t h e subject, a n d j u s t i f i e s h i s f i r m b e l i e f t h a t reason a n d evidence is t h e best hope for m a n k i n d ' s c o n t i n u e d s u r v i v a l a n d w e l f a r e . W i t h i n a few years of t h i s first e n c o u n t e r I w a s i n h o t p u r s u i t of a n a d v a n c e d degree a n d w a s f o r t u n a t e enough to h a v e a t u r n to h e l m m y o w n research m e t h o d s course. I k n e w w i t h t h e perfect c e r t a i n t y o f one possessed w h i c h t e x t b o o k I w a n t e d for m y class, a n d y o u , dear reader, can s u r e l y guess w h a t i t w a s . B u t I w a s a s t o u n d e d to l e a r n t h a t t h e book h a d gone o u t o f p r i n t . " H o w c o u l d they?" I a s k e d m y s e l f . A f e e l i n g o f b e t r a y a l seized me: so q u i c k l y t h i s t r e a s u r e h a d v a n i s h e d f r o m t h e i n t e l -

xvi

Introduction

to the Transaction

Edition

l e c t u a l landscape. I s t i l l h a d m y o w n copy of t h e book ( a n d w h i c h I have k e p t u n t i l t h i s v e r y d a y ) . B u t w h a t a b o u t a l l t h e f u t u r e s t u d e n t s of re­ search methods? W h o w o u l d g u i d e a n d t a k e care o f t h e m , I w o n d e r e d . I n t h e t h r e e l o n g decades since t h e book w e n t o u t of p r i n t , I h a v e s o l d i e r e d on w i t h v a r i o u s t e x t b o o k s . M a n y w e r e n o t bad, some w e r e even q u i t e good. B u t none, despite t h e i r i n d i v i d u a l m e r i t s , c o u l d m e a s u r e u p to t h e h i g h s t a n d a r d of J u l i a n Simon's t e x t b o o k . O h sure, t h e y covered m u c h o f t h e same g r o u n d . T h e steps i n d o i n g a s u r v e y c o u l d be, a n d w e r e , described by these o t h e r t e x t b o o k s . A n d yes, t h e y d i d discuss i n t e r ­ n a l a n d e x t e r n a l r e l i a b i l i t y of c o n t r o l l e d e x p e r i m e n t s . Y e t , these w e r e t e x t b o o k s , i f n o t w r i t t e n b y a c o m m i t t e e , massaged a n d h o m o g e n i z e d b y one. T h e reasons w h y t h e steps w e r e to be u n d e r t a k e n a n d t h e p r o b l e m s t h a t m i g h t ensue i f t h e y w e r e n o t f o l l o w e d w e r e presented, o f course. B u t t h e r e was no r e a l sense o f a p e r s o n s t r u g g l i n g to find reasonable answers i n a n y o f these l i s t s a n d condescending, often p o l i t i c a l l y i n s p i r e d , ex­ amples. A b s e n t e n t i r e l y w e r e a p p r e c i a t i o n s of t h e e v e r - s u r p r i s i n g i r o n i e s a n d capriciousness o f t h e h u m a n a n i m a l . P e r h a p s , w o r s t of a l l , t h e r e w a s no sense of a s h a r e d c u l t u r e , no f e e l i n g for t h e n o r m s a n d f o l k w a y s of t h e research a r g u m e n t , a n d no i n v o l v e m e n t w i t h t h e i n v a l u a b l e r u l e s of t h u m b t h a t m u s t g u i d e social s c i e n t i s t s as t h e y w o r k i n r e a l - w o r l d e n v i r o n m e n t s . I n a w o r d , t h e t e x t b o o k s I f o u n d seemed to be w r i t t e n by teachers o f research m e t h o d s r a t h e r t h a n b y researchers. A n d t h e y seemed to be w r i t t e n for people w h o are t a k i n g t h e course to f u l f i l l a r e q u i r e m e n t r a t h e r t h a n a p e r s o n a l m i s s i o n , c e r t a i n l y n o t for someone on a m i s s i o n to become a researcher. So p a r t o f m y p a i n has been t h a t n e w g e n e r a t i o n s of r e s e a r c h e r s w e r e effectively d e n i e d t h e deeper u n d e r s t a n d i n g of "how w e k n o w w h a t is so" t h a t Professor Simon's able assistance w o u l d have p r o v i d e d . T h a t is, he w a s able to convey t h e r e a l i t i e s , lessons, a n d l i m i t a t i o n s o f d o i n g re­ search, a n d cover a v a s t a r r a y o f approaches, w i t h o u t t h e necessity of s t u d e n t s h a v i n g to experience t h e f a i l u r e s a n d m i s t a k e s t h a t w o u l d l e a d to k n o w i n g w h y t h i n g s s h o u l d be done i n one w a y as opposed to a n o t h e r . H e focuses i n v a r i a b l y on w h y t h i n g s are done, so t h a t t h e s t u d e n t can u n d e r s t a n d a n d e x t e n d t h e p r i n c i p l e s to be effective consumers a n d cre­ ators of social science research. B y c o n t r a s t , m o s t o t h e r m e t h o d s t e x t ­ books w a n t to g i v e t h e r u l e s of p r o c e d u r e as d e r i v e d f r o m o t h e r t e x t b o o k s , so t h a t s t u d e n t s are left w i t h m y r i a d l i s t s t h a t have t h e same i n t e l l e c ­ t u a l m e a n i n g as a t a x f o r m . Since t h a t first research m e t h o d s course t h a t I t a u g h t i n 1973, over t w o t h o u s a n d s t u d e n t s h a v e been u n d e r m y t u t e l a g e i n v a r i o u s l y n a m e d , b u t e s s e n t i a l l y t h e same, p r i n c i p l e s o f research m e t h o d s classes. (Since t h e s c i e n t i f i c m e t h o d i t s e l f has r e m a i n e d e s s e n t i a l l y t h e same for several c e n t u r i e s , I h a v e no q u a l m s a b o u t s a y i n g m y t e a c h i n g o f i t has been e s s e n t i a l l y t h e same for decades. O f course, t h e specific d e t a i l s of scien­ t i f i c research p r o g r a m s a n d tools change d r a m a t i c a l l y i f one speaks of years; t h e same could be s a i d a b o u t m y course i f one speaks o f semes-

Introduction

to the Transaction

Edition

xvii

t e r s . ) T h o u g h I t r i e d m a n y o t h e r t e x t b o o k s , none h a v e p r o v e n t h e m ­ selves e q u a l to t h a t of t h e good professor's. A b o v e I have g i v e n reasons w h y t h i s r e s e a r c h m e t h o d s book is out­ s t a n d i n g . B u t let me complete t h e discussion here. I believe t h a t Profes­ sor S i m o n w r o t e t h i s book based on v a s t experience d o i n g r e s e a r c h , often t u t o r i n g d i r e c t l y research a s s i s t a n t s w h o h e l p e d on h i s projects. W h e n i t came t i m e to teach h i s o w n course, he i m m e d i a t e l y saw t h e p r o b l e m , w h i c h is t h e same one t h a t I h a v e faced for t h i r t y y e a r s . T h e crop o f t e x t b o o k s does n o t give t h e s t u d e n t w h a t t h e s t u d e n t r e a l l y needs, w h i c h is t h e r e a s o n i n g u n d e r l y i n g r e s e a r c h a n d t h e reasonable w a y s to get i t done. L e a r n i n g to do r e s e a r c h is a k i n to l e a r n i n g a n y o t h e r s k i l l , such as s w i m m i n g or r i d i n g a bicycle. ( I t h a n k H o w a r d S. Becker for t h i s a n a l ­ ogy.) Y o u c o u l d p r e s e n t a l l t h e c h a l k b o a r d lectures y o u m i g h t w a n t , b u t t h e r e is essentially only one w a y t h a t anybody is going to be able to s w i m or b i k e ride, a n d t h a t is by t r y i n g to do i t . T h i s book is the next best t h i n g to d o i n g r e s e a r c h , a n d d o i n g i t w i t h a c o n v i v i a l senior m e n t o r . A n d i t is a n excellent f i r s t step to b e g i n n i n g to do research, j u s t as k n o w i n g about m o m e n t u m a n d resistance are to t h e s p o r t s of s w i m m i n g a n d b i k i n g . T h r o u g h t h e w i s d o m , p e r c e p t i v e n e s s , a n d s e n s i b i l i t y o f Professor I r v ­ i n g L o u i s H o r o w i t z , e d i t o r i a l d i r e c t o r o f T r a n s a c t i o n PubHshers, i t is t h e h a p p y fate o f t h i s w o r k to be a t l a s t i n p r i n t a g a i n . I t is also a h a p p y a n d f e l i c i t o u s event for g e n e r a t i o n s of s t u d e n t s w h o are a l r e a d y a s p i r i n g to be researchers to have t h i s c a n n y g u i d e . I t w i l l also m o t i v a t e some s t u ­ d e n t s w h o h a v e no i n t e r e s t i n b e c o m i n g r e s e a r c h e r s as a v o c a t i o n to become c o m p e t e n t a m a t e u r r e s e a r c h e r s w h o can h e l p t h e m s e l v e s a n d others by understanding the craft. C e r t a i n l y m a n y students i n m y fu­ t u r e r e s e a r c h m e t h o d s courses w i l l be able to enjoy t h e d o w n - t o - e a r t h reasonableness of a n a s t u t e p r a c t i t i o n e r . A w o r d is due H a n a n S e l v i n , w h o was t h i s book's m i d w i f e . H a n a n was a b r i l l i a n t m e t h o d o l o g i s t i n h i s o w n r i g h t . H e was a k i n d , d e d i c a t e d t e a c h e r w h o l i v e d for h i s c r a f t a n d r e l i s h e d every m i n u t e o f i t . Sadly, H a n a n was s t r u c k w i t h a progressive d e g e n e r a t i v e disease t h a t l e d e a r l y to h i s b l i n d n e s s . B u t i n s t e a d o f f e e l i n g s o r r y for h i m s e l f , he began to s t u d y t h e sociology of b l i n d n e s s . I w a s a y o u n g g r a d u a t e s t u d e n t i n t h e sociology d e p a r t m e n t of R u t g e r s w h e n he v i s i t e d t h e c a m p u s for a guest s e m i n a r . H i s l e c t u r e was on t h e t r i c k s of t h e t r a d e p r a c t i c e d b y b l i n d people to get a r o u n d i n a w o r l d b u i l t for those w i t h s i g h t . I was f o r t u n a t e e n o u g h to m a k e h i s a c q u a i n t a n c e on t h i s occasion, a n d he p r o v i d e d me w i t h u s e f u l m e t h o d o l o g i c a l advice a b o u t t h e research t h a t I was d o i n g for m y d i s s e r t a t i o n . H a n a n c l e a r l y cared e n o u g h to spend h i s t i m e h e l p ­ i n g o t h e r s l e a r n about t h e s c i e n t i f i c m e t h o d a n d i t s a p p l i c a t i o n s to h u ­ m a n behavior. W i t h o u t his strong i n t e r v e n t i o n and encouragement, this book b y Professor S i m o n m i g h t never h a v e been p u b l i s h e d i n t h e first place. So i n a n i m p o r t a n t sense, t h e r e - i s s u i n g o f t h i s t e x t b o o k is a t e s t i ­ m o n i a l to n o t o n l y t h e e n d u r i n g sense of craft, w i s d o m , a n d concern of Professor S i m o n , b u t of Professor S e l v i n as w e l l .

xviii

Introduction

to the Transaction

Edition

L e t me r e t u r n to w h a t I n o t i c e d w h e n I e x a m i n e d t h i s book for t h e f i r s t t i m e , back i n t h e bowels o f t h e campus b o o k s t o r e , b a c k so m a n y decades ago. U n l i k e t h e covers o f others s t r e w n n e a r b y , t h i s textbook's cover offered no p r o m i s e t h a t i t s p u r c h a s e w o u l d a l l o w a s t u d e n t to change t h e w o r l d for t h e b e t t e r . Y e t i t s h o u l d have. T h i s book offers one o f t h e best stories y e t w r i t t e n a b o u t h o w to a c t u ­ a l l y conduct research u s i n g r e a l tools i n a r e a l w o r l d . W i t h these tools, one can g a i n a n u n d e r s t a n d i n g o f t h e w o r l d , n o t as one w o u l d l i k e i t to be, b u t h o w i t is. T h i s is a necessary f i r s t step. M e a n i n g f u l progress is r a r e l y achieved save t h r o u g h t h e r a t i o n a l i t y of science, w h i c h i t s e l f m u s t be i n f o r m e d by s o u n d research. S o u n d research is t h e one best k e y t h a t is d a n g l i n g before us. I f w e g r a s p a n d a p p l y effectively t h i s k e y , i t w i l l open for us doors to a w o r l d o f n u r t u r i n g social e n v i r o n m e n t s , p e r s o n a l a u t o n o m y , a n d g r e a t o p p o r t u n i t i e s for p u r s u i t o f p e r s o n a l f u l f i l l m e n t . N o w t h a t Professor Simon's book is back i n p r i n t , m o r e s t u d e n t s w i l l be able to u n d e r s t a n d a n d do research. T h e y w i l l be g i v e n s o r e l y needed g u i d a n c e on h o w to use t h e k e y o f research to create a f u t u r e of progress, abundance, a n d c i v i l i t y . T h i s is s o m e t h i n g he w a n t e d for us as m u c h as we w o u l d w a n t i t for ourselves. James E. Katz December 13, 2002

preface to second edition

The aim of this second edition is the same as that of the first: to help y o u get started doing empirical research. The book has been reorganized con­ siderably to better serve that purpose. The chapters on research decisions and procedures n o w come early i n the book (Part I I ) , followed by chapters on the obstacles to research (Part I I I ) . A n d now there are complete chapters on sampling (Chapter 9 ) , experiments (Chapters 10 and 11), surveys (Chapter 13), and scaling (Chapters 15 and 16), plus independent appendices on questionnaire construction (Chapter 20) and i n t e r v i e w i n g (Chapter 1 9 ) ; i n the first edition this material was scattered throughout the book. There is also a new chapter on the relationship between theory and empirical research (Chapter 5 ) . The first edition of this book is not intellectually obsolete. The practice of good science has changed little since the first edition—or i n the last half century, for that matter; this explains w h y most of the examples have not been changed from the first edition. But the organization of the second edition should be more convenient for you to read. A n d over the years I V e learned h o w to express some ideas better. I n addition, the new material fills some needs that readers of the first edition said needed filling. Last but not least, chapter summaries and other pedagogical devices should ease your studying. T h o u g h this is a practical how-to-do-it book, it aims also to teach the basic concepts i n the philosophy of science. I n m y view this is not a con­ tradiction, because the philosophy of science at its best is also a very practical subject, composed of ideas that clarify the nature and meaning of research and help the researcher better understand h o w to proceed when faced w i t h diflBcult choices and decisions. This book may be deceptively easy to read. I have always hated ob­ scurantism and have been emotionally committed to simplicity, perhaps

XX Preface to Second

Edition

because of the coincidence of my name w i t h the nursery rhyme "Simple Simon met a pieman. . . ." B u t simplicity has its drawbacks, such as: ( 1 ) Some students concluded that the first edition was so simple to read that i t was insufficiently "challenging." Simplicity has a basic psychological dif­ ficulty, as expressed by T. S. E l i o t about poetry: " A successful poem must be sufficiently simple so i t can be understood, but sufficiently difficult so that i t cannot be understood immediately." B u t I am u n w i l l i n g to make m y w r i t i n g h a r d to understand so as to be more appealing. ( 2 ) Simplicity of expression may fool y o u into t h i n k i n g that the ideas discussed here are simple ones. They are not. ( 3 ) As someone has said: Seek simplicity—but distrust i t . On the other hand, the book may seem overly long, containing too many examples. B u t as someone else once said, short w r i t i n g makes long reading. So I hope y o u w i l l not be p u t off by the book's length. The w r i t i n g style is more casual than i t is conventional, and i t does not always meet editors' standards as to what is "correct." Therefore I am pleased to take full responsibility for the language. Despite that this is a text, i t contains some new scientific ideas. Those ideas I most hope that y o u w i l l notice as being original contributions are, first, causality as a concept r e q u i r i n g appropriate operational definition (Chapter 32) and second, the M o n t e Carlo approach to basic statistics (Chapters 27 to 3 1 ) . Since the first publication of the M o n t e Carlo methods i n this book, there has developed a body of experimental evidence that this radically different way of learning and doing statistics is more effective, easier to understand and more effective than the conventional analytic method (Simon, Shevokas, and A t k i n s o n ) . A b o u t the sex of pronouns i n the book: The first edition used "he" for the unidentified person. That w i l l not pass any longer, and never should have passed. However, I do not wish to clutter up the w r i t i n g w i t h clumsy "he or she" phrases or to weaken i t w i t h annoying circumlocutions. The solution I've chosen is simply to sometimes w r i t e "he" and sometimes "she," more or less at random, when I mean an unidentified human being. This practice may seem a b i t strange at first, b u t I trust that i t w i l l cause no confusion and w i l l be pleasant and efficient. I f this practice seems good to the reader, maybe other writers w i l l adopt i t , too. A personal note: The considerable success of the first edition heartened me i n many ways. First—and this may i n turn hearten other prospective authors—the skeletal first draft of the manuscript was rejected by fifteen (15) publishers before Hanan Selvin pointed out its potential to Random House. Second, I w o r r i e d that because the book is not narrowly specialized to a single discipline, each teacher w o u l d say that i t m i g h t be good for others b u t not for h i m or her. I t turns out, however, that there are a good many instructors w h o believe i n a broad education i n research methods; this heartens me because I , too, believe that. T h i r d , I w o r r i e d that the lack of mathematical notation and "sophisticated" and "rigorous" complexity

Preface to Second Edition

xxi

w o u l d deter instructors from adopting the book; b u t apparently there are plenty of instructors w h o are more interested i n teaching than i n impressing students, and that cheers me, too. I n closing, I am grateful for your attention to my thoughts and w r i t i n g . I benefited from the suggestions and corrections I received from readers of the first edition, and I shall be glad to receive more on this second edition. T h a n k you.

acknawledgments t o s e c o n d edition

H a n a n Selvin listened to tapes of many chapters for the second edition, especially the new ones, and gave me a flood of useful and delightful com­ ments, only some of w h i c h d i d I have the strength to exploit. Gideon Keren made helpful comments on Chapter 11. A n d the f o l l o w i n g long list of people were k i n d enough to send me evaluations or remarks about the first edition: Marcus Felson, W i l l i a m Ahlhauser, Wayne W . Snyder, Jason M i l l m a n , Robert A . Baker, Oleh W o l o w v n a , Lawrence G. Smith, L u c y W . Sells, L e i g h Marlowe, Siamak Movahedi, Michael A. Baer, Joann S. D e L o r a , M a r c D . Magre, Leroy Gruner, John H . Kramer, Jules W . Delambre, A l a n C. Acock, Robert W . Shotoln, D a v i d M . Krieger, Michael D . Grimes, L . V. Hayes, Robert C. Smigelski, D a v i d G. Pfeiffer, Jamie M . Calderon, D . E. W . Holden, D a v i d M . Monsees, Jr., L . Fannin, L u t h e r H . Keller, L i n d a Brookover, Bourque, D a v i d Nasatir, Kenneth R. Rothrock, Louis A. B r o w n , LeRoy Martinson, Costs Wolff. I am grateful to one and all. More and more my family and our life together sustains me. W i f e Rita and children D a v i d , Judith, and D a n i e l give my days the joy and meaning that enable me to w r i t e . Urbana, Illinois, 1978

J.L.S.

acknowledgments t o f i r s t edition

For a h u n d r e d and more years ingenious social scientists have faced ob­ stacles to getting the empirical knowledge they sought, devised ways to circumvent the obstacles, and then t o l d others w h a t they learned. I am grateful to them. Hanan Selvin subjected the penultimate draft of this manuscript to the searching scrutiny that most manuscripts need b u t few are lucky enough to get. The quantity and quality of his critical comments were an author's dream, and I have i n many places appropriated his thought and w o r d w i t h ­ out special note. There w o u l d be fewer errors and obscurities i n the book i f I had followed his advice even more diligently. James W . Carey gave me many exciting and enjoyable hours discussing some of the fundamental concepts. I also benefited from talks w i t h H o w a r d Maclay. Other friends may also recognize their casual observations i n these pages. A l l e n Holmes read the chapters on statistics very carefully and cor­ rected some errors. Dennis J. Aigner was also good enough to look at those chapters. For useful references and suggestions I am grateful to Stanley Friedman, Lewis Goldberg, James H . Lorie, and Louis Schneider. M y greatest debt is to m y wife, Rita. W i t h o u t her encouragement this book w o u l d never have been done. I have t r i e d to cover the w i d e intellectual area of all the social sciences, and I hope the reader w i l l bear w i t h me w h e n I depart from the substantive fields that I k n o w best. I w i l l appreciate hearing from anyone w h o can set me straight on any matter or w h o can give an instructive or interesting exam­ ple to illuminate a point. Jerusalem,

1968

J.L.S.

prologue

Cast your m i n d back 450 years. The Governor of R o d e r a - a small Asian p r i n c i p a l i t y - d e c i d e d to find out w h y the country's tax revenues were not greater. Here is a classic p r o b l e m for research, and m u c h of mathematics and social-science techniques was originally invented to improve tax collec­ tion. The Governor first consulted an adviser w h o had studied i n Europe and had learned something of Aristotehan logic. The adviser reasoned that ( a ) good citizens w i l l i n g l y pay as m u c h tax as they are able to, ( b ) the folk of Rodera were very good citizens, and therefore ( c ) the tax collections could not be higher. The Governor immediately dismissed this syllogistic t h i n k i n g as pure non­ sense because he disbelieved the premises. Even w i t h o u t special t r a i n i n g he was smart enough to understand the weakness of b a d logic. Next, the Governor called i n the regional tax collectors and p u t the prob­ lem to them, because they were experts i n the tax-collection field. After con­ sultation the consensus among the tax collectors was that the people of the country simply could not afford to pay any more taxes. T o support this statement the tax men stated that they had already used every possible technique to extract higher taxes-raising valuations of property, checking on hidden assets, and the l i k e - a n d even their best efforts could not raise more money. The Governor understood the tax men's analysis of the situation as a self-seeking attempt to make themselves seem competent and h a r d - w o r k i n g , and he therefore disregarded this "expert opinion." The Governor then concluded reluctantly that he w o u l d have to find out the answers for himself. H e instructed an aide to b r i n g h i m a handful of typical citizens. The aide brought i n some people w h o were close at hand. They included a few beggars w h o had been loitering nearby, plus the aide's brother and t w o young guards. I t was obvious to the Governor that these

Prologue

XXV

people were not representative of the population, and therefore i t was use­ less to question them. To put i t i n modern terms, the Governor recognized that he had a badly biased sample. The aide was therefore instructed to b r i n g i n some "typical" peasants and townspeople, w h i c h he d i d . The Governor then asked them, " W h y don't you pay more taxes?" The first few answers showed the folly of the question, for w h a t he heard were excuses, complaints, entreaties—everything except w h a t seemed to be sensible answers. T h e Governor then confined his questions to factual matters. H e asked about each person's property, his crops, family size, and the amount he p a i d i n taxes. Assuming that the answers were true and that he could have them checked, the Governor thought he was getting somewhere. The information obtained from such a small handful of people clearly was not enough, however, because the group d i d not include any rich men, any people from the far provinces, any foreign residents, or any representatives of other important classes of people. Nor could the Governor tell from this sample how many people i n each class there were. Therefore, the Governor ordered a nation-wide house-to-house census. Nowadays the data for a complete nation-wide U n i t e d States census can be collected i n months. But transportation and communications were poorer then, and few literate people could be found to collect information. The census therefore required twelve years. By the fifth year the Governor grew impatient and decided to experiment w i t h the effect of new pressures to increase tax collections. He ordered that anyone w h o d i d not pay half again as much tax as i n the previous year w o u l d have his cattle confiscated. I t turned out that the total taxes p a i d d i d not increase. The crops were bad that year, and the Governor could not de­ termine whether the taxes w o u l d have been as high or higher than otherwise if the crop had been normal. The trouble was that his experiment was un­ controlled, because he had not kept the o l d tax system i n effect i n some areas for comparison—that is, his experimental design was incomplete. A n d worst of all, many people slaughtered their cattle to avoid confiscation, w h i c h ruined the country's meat supply for several years. By the time the census data were all i n hand at the end of the t w e l f t h year, the country's situation had changed, and the data collected i n early years no longer meant much. (Nowadays we k n o w h o w to take samples to reduce cost and avoid l o n g time lags d u r i n g w h i c h the picture may change. ) Furthermore, after twelve years the tax-census data filled a whole ware­ house, and another ten years w o u l d have been required to interpret them. The Governor gave up the task i n disgust and retired to his harem. The p o i n t of this story is that knowledge is often not easy to come by because there are many obstacles i n one's path. A n d common sense alone is not enough. B u t b y n o w social science has accumulated a body of tested experience on h o w best to overcome the obstacles and acquire empirical knowledge efficiently and safely. This book presents some of this accumu­ lated experience.

basic reseorcii m e t l i a d s in social science

1 introduction 1. 2. 3. 4.

Purpose of the Book What Kinds of Research Are Called " E m p i r i c a l " ? The Place of Statistics in the Study of Research Methods Some General Remarks

1. Purpose of the Book This book is p r i m a r i l y for students w h o have never before studied or done empirical social-scientific research. I hope that the book contains good ad­ vice that w i l l help y o u get your first research project off the ground success­ fully and increase your efficiency i n later work. As for those of y o u w h o w i l l not do empirical research, the book may teach y o u to distinguish good research from poor research and help y o u to understand w h y empirical researchers do things as they do them. People w h o have had some t r a i n i n g or experience i n empirical research may also gain from the elementary level of the discussion. Basic concepts often are bypassed as one rushes to learn the methods of particular fields. C o m i n g back to fundamentals can w i d e n the perspective of an advanced student and fill holes i n his knowledge. I f an advanced student is to gain something from this book, however, he must have the w i s d o m to realize that the apparent simplicity of the basic concepts is often deceiving. For ex­ ample, everyone knows that ceteris paribus—holding " a l l other things equal"—is important. B u t the more research y o u do, the more y o u realize how complex is the ceteris paribus idea, h o w difficult i t is to choose the r i g h t ceteris paribus conditions, and h o w often research is useless because other things really were not made sufficiently equal. The book is intended for future producers of research, of course. B u t many people w h o study research methods w i l l never produce research; rather, they w i l l be research consumers, i n their jobs and as citizens. For this important latter group of people, the aim of the book is to teach h o w to

4

Introduction

evaluate research done by others—to k n o w w h i c h research is good and w h i c h is not; where the weak spots are i n a piece of research and h o w i m p o r t a n t they are; and whether a research finding, whether presented i n the professional literature, i n an informal report, or i n the popular press, is likely to be v a l i d or not. For example, Volvo recently advertised that "90 percent of the Volvos sold i n the last eleven years i n the U.S. are still on the road." A good course i n research methods is likely to alter your impression of w h a t that claim m i g h t mean. I hope that y o u yourself carry out some empirical research—no matter how small i n scope—as y o u read this book. I t is not enough to study empiri­ cal research the way one studies astronomy, economics, psychology, and other academic subjects. Reading about research principles is certainly use­ ful. B u t research is not entirely an academic subject. Rather, i t is largely an art, a how-to-do-it subject like musical composition, w r i t i n g advertisements, or swimming.^ You never really k n o w h o w to do research u n t i l y o u do i t , any more than y o u can k n o w h o w to swim after only pool-side instruction. You must j u m p into the water, thrash around, and gradually improve w i t h practice. For the same reason, skill i n empirical research requires experience —that is, d o i n g research of your o w n and criticizing the research of others.^ For example, w h e n you do a piece of research you cannot fail to learn just how complicated even the simplest research really is. Eventually y o u w o u l d learn m u c h that is i n this book b y trial and error. Instruction can only hasten the process and make i t less painful b y showing w h a t has and has not w o r k e d for others. B u t that is as far as the teaching can go. (Reading w h a t others say about research can be enormously profit­ able, however, if y o u w i l l benefit b y the experience of others.) N o w a theme that w i l l recur throughout the book: There is never a single, standard, correct method of carrying out a piece of research. D o not w a i t to start your research u n t i l you find out the proper approach, because there are always many ways to tackle a problem—some good, some bad, b u t p r o b a b l y several good ways. There is no single perfect design. A research method for a given p r o b l e m is not like the solution to a p r o b l e m i n algebra. I t is more like a recipe for beef stroganoff; there is no one best recipe. For technical matters, too, there may be several satisfactory techniques, and there is no cut-and-dried answer. For example, i f y o u w a n t to do a 1. A great mathematician applies the same analogy to his sort of work, too: "Solving [mathematical] problems is a practical skill like, let us say, swimming. We acquire any practical skill by imitation and practice" (Polya, p. 4). 2. Also valuable are the too-rare accounts of how and why a piece of research was done. Three short descriptions of this sort can be found in W. Wallis and H. Roberts (Chap. 3). Anthropologists often weave such material into their writings; W. Whyte (2nd ed., Ap­ pendix A) has a delightful and useful description of this sort. K. Colby's account of Semmelweiss and childbed fever is excellent (pp. 44-50). R. Braidwood's "Biography of a Research Project" is a fascinating description of archaeological research. P. Hammond offers some excellent examples. C. Mills (pp. 199 ff.) interweaves a description of one of his labors with marvelous general advice on craftsmanship and how to get ideas and do social analysis. J. Madge has written a book full of such accounts of sociological research.

Introduction

5

questionnaire survey, should y o u i n t e r v i e w b y m a i l , b y telephone, or i n person? Chapter 13 discusses the pros and cons of each method, b u t eventu­ ally sound j u d g m e n t is required for this technical decision; no handy rule book can make such decisions for y o u . Or, should y o u pay your subjects i n an experiment or survey? A g a i n there is no pat answer; instead, this book tries to give y o u some principles that w i l l help y o u to make sound technical decisions about your research methods. I n this book the w o r d "method" refers to empirical techniques and devices of various sorts. T o the philosophers the term refers to "scientific method"— the whole process of getting knowledge, i n c l u d i n g the theoretical and the empirical steps. B u t even then i t is surely true that there is no single "scien­ tific method." . . There is no one scientific m e t h o d . . . there w i l l be as many different scientific methods as there are fundamentally different kinds of problems" ( N o r t h r o p , p p . ix, 1 9 ) . "The scientific method, as far as i t is a method, is n o t h i n g more than d o i n g one's damnedest w i t h one's m i n d , no holds barred" ( B r i d g m a n , p . 4 5 0 ) .

2. W h a t K i n d s of R e s e a r c h A r e C a l l e d " E m p i r i c a l " ? This book deals w i t h empirical^ research and not w i t h scientific speculation. M u c h of scientific w o r k consists of t h i n k i n g up ideas about the nature of the w o r l d , generalizing from observed facts to scientific "laws," and developing logical systems that are called "theories" or "models." B u t simply as a d i v i ­ sion of labor the speculative part of science is not covered here. T h e subject of this book is "getting the facts." O f course empirical w o r k goes beyond "mere" observation and description, and i t is inextricably i n t e r t w i n e d w i t h explaining nature and m a k i n g predictions about i t . B u t the process of think­ ing up explanations or hypotheses about nature and its laws is beyond our scope here.^ Nor is this book concerned w i t h the logical process of b u i l d i n g scientific theories. T h a t is, i t does not deal w i t h the process of finding the logical relationships between various scientific statements or w i t h the process of developing generalizations or scientific laws. Rather, its subject is the less glamorous craft of p r o d u c i n g and examining factual and material evidence and sense data to develop the descriptions, measurements, comparisons, and tests of hypothesized relationships that are themselves part of the specula­ tive side of scientific work. Lest there be misunderstanding, I emphasize that a good idea is the 3. . . [T]he adjective empirical, in its combinations with various nouns, appears to denote observations and propositions primarily based on sense experience and/or derived from such experience by methods of inductive logic, including mathematics and statistics" (The Dictionary of the Social Sciences, p. 237). The crucial distinction here is between empirical research and theorizing, though the two activities are very much interdependent. 4. The elusive phenomenon of scientific discovery is discussed at len8;th by many writers. See, for example, J. Young, A. Bachrach (Chap. 1), W. Beveridge (Chaps. 5, 6), C. Mills (Appendix, especially pp. 200-201, 211-217).

6

Introduction

keystone of an empirical study. Mere data collection and measurement are worthless unless the subject is important. Theory is often the fount of i m ­ portant ideas for empirical research, and sound theory is of inestimable value i n any field. T h e relationship between theory and empirical research is explored i n Chapter 5. Finally, "empirical research" excludes knowledge obtained b y consulting authorities, i n books or i n person.*"* I t includes only knowledge obtained from data resulting from first-hand observations, either b y you or by some­ one else. Reexamination of data collected by others, such as U.S. Census data, is empirical research, of course. Most of the examples i n this book are d r a w n from "pure" research. B u t I also d r a w examples concerning poHcy decisions from the "applied" social sciences, and they often have a dollars-and-cents orientation. A p p l i e d re­ search methods are sometimes more sophisticated than are methods used i n pure research (Stouffer, 1950a, pp. 198-9), because i t is possible to w o r k u p some calculation, even though crude, to compare the benefits expected from a research m e t h o d against the cost of doing the research w i t h that method. Such calculation leads to efficiency i n research. (Pure research can be de­ fined as research whose social or economic payoff is far i n the future, whereas applied research is expected to have a quick payoff. B u t pure research is often done w i t h o u t any thought at all about payoff, just to satisfy the desire to understand. ) Here are t w o examples of h o w dollars-and-cents calculation i n an applied problem helps to make sensible decisions about research: First, an adver­ tiser can calculate whether comparing t w o advertisements i n a split-run test^ is w o r t h the cost of the test. One can also compare the costs and benefits of a split r u n against the costs and benefits of other types of adver­ tising research. Second, a candy firm can sensibly calculate its sample size w h e n comparing t w o new flavors of candy. I t can reckon the relevant costs and dollar benefits. I t is m u c h harder to determine the sample size sensibly w h e n I . Pavlov, for example, studies h o w the flow of saliva i n dogs can be conditioned to the sound of a bell. T h e possible benefits of such pure re­ search—some intangible gain i n the quality of h u m a n life perhaps far i n the future or merely the satisfaction of an urge to understand our w o r l d better —are less predictable and m u c h more difficult to evaluate i n money terms to balance against the money cost of the research. Yet the decisions must be made anyway.'^ Our examples come from the social sciences, not only because the book is 5. But see the section on "expert opinion" in Chapter 14. 6. In a split-run test a magazine or newspaper arranges to print two advertisements so that each appears automatically in every other copy in each stack of copies that is sent out. This is the closest thing to a perfect experiment. 7. It is always conceptually possible to develop some rational calculation of the value of a piece of research and its various outcomes so as to have some rational guidelines for re­ search decisions (see Chapter 8; Wilson 86-87; Schlaifer. But in most pure research such calculation is not done because it is so difficult to do meaningfully.

Introduction

7

intended for students of the social sciences, b u t also because empirical scientific methods have been used w i t h greater variety and greater subtlety i n the social sciences than i n the natural sciences (Chapter 33 defends this claim). W e shall t u r n frequently to a few such famous studies as A l f r e d Kinsey's research on sexual behavior, employment and unemployment surveys, Her­ man Ebbinghaus' learning experiments. Presidential election polls, televisionaudience ratings, Sigmund Freud's case history of Anna O., the U.S. Sur­ geon Ceneral's Report on Smoking and Health, and I v a n Pavlov's w o r k on conditioning reflexes. These studies have been chosen for several reasons: They are inherently interesting; most students have a general knowledge of them from survey courses or general reading; most of the studies have been repeated or scrutinized closely b y outside experts ( f o r example, the American Statistical Association appointed three top statisticians to report on Kinsey's methods); and they show us a w i d e variety of methods and a broad repre­ sentation from the social sciences. I also refer frequently to m y o w n work, despite its limitations, because I know exactly w h a t w e n t into the work—the difficulties, decisions, errors, corrections of errors, and the order i n w h i c h things took place. One cannot have such intimate knowledge of anyone else's work. Yet i t is these details and decisions, seldom w r i t t e n about, that are hardest for the novice to understand and master and that one usually learns only b y serving as an apprentice ( t h a t is, "graduate assistant") or b y trial and error i n one's o w n work. The book emphasizes the design and plan of research, rather than the analysis of research data. Except for studies that reanalyze data collected b y others, the most important and interesting decisions arise at the design stage, or at least they should arise then. I f you postpone these decisions u n t i l after the data have been collected, y o u may suffer heartbreak and wasted expense. Here are three brief examples of the importance of good design: First, t w o l i b r a r y scientists sought to determine the proportions of various-sized books i n research libraries. So they measured the heights of each of the hundreds of thousands of books i n a major library. W i t h a little p l a n n i n g and sound design they w o u l d have needed only to measure a fraction of that number of books. A n d w i t h sound design the results c o u l d also have been applied to libraries other than the one they studied. Second, a family-planning group tested one birth-control propaganda campaign i n Village A against another propaganda campaign i n Village B, forgetting that subsequent differences i n b i r t h rates and contraception-ac­ ceptance rates m i g h t reflect basic differences between the t w o villages un­ connected to the campaigns, rather than only the differences between the campaigns. Sound p l a n n i n g w o u l d complete the design b y alternating b o t h campaigns i n the t w o villages or b y other methods. The same sort of error has often been made b y experimental psychologists and sociologists.

8

Introduction

T h i r d , our understanding of v o t i n g behavioi^ has been greatly enhanced by the use of the panel method, i n w h i c h the same people are quizzed about their v o t i n g behavior several times d u r i n g the same election campaign. These repeated observations make i t possible to understand the mechanism of v o t i n g and vote shifting i n ways that are impossible w i t h o u t the panel design. Mistakes at the design stage can be mended only at great extra cost, or not at all. B y comparison, mistakes at the analysis stage can be remedied at slight or no cost as long as the mistakes have not gotten into p r i n t or been acted upon. T h i n k t h r o u g h the research design carefully i n advance. F a i l i n g to con­ sider all the necessary details at the design stage because of procrastination or mental laziness is one reason that many researchers get very little done. Of course not everything can be foreseen, especially i n exploratory studies, but one should use as m u c h foresight as possible. I t is useful to talk to your friends about the design, and to prepare an outHne of i t ; b o t h processes reveal fuzziness i n your design t h i n k i n g . T h e design of a piece of research must depend upon the particular pur­ pose that the research is intended to serve; this is a message I shall repeat again and again. For example, the I n t e r n a l Revenue Service publishes statis­ tics on the amount of advertising done b y groups of fimis that sell various products, and these statistics include all firms' advertising, because the gov­ ernment requires data on the entire economy. B u t the statistics gathered and published by industry trade associations cover only the leading firms, because the industry-collected statistics are designed to meet only the i n ­ formation needs of the industry. For another example, a psychologist study­ ing the relative m e m o r a b i l i t y of beginning, m i d d l e , and ending portions of messages w i l l use a different test of memory (perhaps a list of nonsense syllables) t h a n w i l l a psychologist w h o is studying h o w many pieces of information a radio operator can remember accurately. You must ask your­ self repeatedly: Exactly what do I want to find out? and why? If you can answer these questions clearly and precisely, you have gone a long way toward creating a satisfactory research design. Part One begins w i t h Chapter 2 on the language of science, w h i c h is inseparable from science itself. Chapter 3 discusses basic concepts such as variable, function, sampling, and the ideal p a r a d i g m for the study of causal relations. Chapter 4 classifies the types of questions that empirical research is asked to answer; this classification aids i n deciding w h a t types of research methods are appropriate for any given study. Chapter 5 explores the rela­ tionship between theory and empirical research. A n d Chapter 6 discusses the choice of appropriate empirical proxies (indicators) for theoretical vari­ ables. Part T w o gets d o w n to the brass tacks of just h o w to conduct a piece of research. Chapter 7 provides a checklist of the basic steps one often takes w h e n executing an empirical research project. I f y o u are actually d o i n g a

Introduction

9

piece of research i n conjunction w i t h reading this book—as I very m u c h hope you are—then you should certainly start b y reading Chapter 7. Other chapters i n Part T w o discuss crucial decisions i n the research process—assessing the value of a prospective piece of research (Chapter 8 ) , and whether to choose experimentation or the survey method ( Chapters 10 and 13) or other methods (Chapter 1 4 ) . Other chapters cover efficiency i n sampling (Chapter 9 ) , experimental design (Chapter 11), and the proce­ dures of classification and measurement (Chapters 15 and 1 6 ) . Part T w o ends w i t h the discussion of data h a n d l i n g and data adjustment i n Chapter 17. I t was a toss-up whether to reverse Parts T w o and Three. The reader may choose to read them i n either order. The chapters i n Part Three consider the various types of obstacles that nature puts i n the w a y of the fact seeker, and that prevent one from getting v a l i d answers quickly and easily w i t h common sense alone. Ways to sur­ m o u n t each obstacle are also discussed. These ways of surmounting ob­ stacles to knowledge are the w a r p and woof of research method. Part Four discusses w h a t to do w i t h your data once they are c o l l e c t e d how to analyze them and h o w to interpret them statistically (Chapters 2 5 3 0 ) . Chapter 31 discusses h o w to decide on a sample size; i t properly belongs i n Part T w o , b u t i t had to follow after Chapters 25-30. Chapter 32 discusses the crucial concept of causality i n social science. 3. T h e P l a c e of Statistics in the Study of R e s e a r c h Methods A w o r k i n g knowledge of the basic ideas of statistics and p r o b a b i l i t y helps clarify one's t h i n k i n g and improves one's capacity to deal w i t h practical problems and to understand the w o r l d . A n d to be efficient a social scientist is almost sure to need knowledge of statistics and probability. O n the other hand, great research has been done b y people w i t h no formal knowledge of statistics. A n d a little study of statistics sometimes befuddles students into t h i n k i n g that statistical principles are guides to research design and analysis. This mistaken belief only inhibits the exercise of sound research t h i n k i n g . Kinsey p u t i t this w a y : However satisfactory the standard deviations may be, no statistical treatment can put validity into generalizations which are based on data that were not reasonably accurate and complete to begin with. I t is unfortunate that academic departments so often offer courses on the statistical manipulation of human material to students who have little understanding of the problems involved in securing the original data. . . . When training in these things replaces or at least precedes some of the college courses on the mathematical treatment of data, we shall come nearer to having a science of human behavior. (Kinsey, et al, p. 35) T h r o u g h o u t the book statistical ideas are submerged except as a hand­ maiden to research methods and research decisions. I n addition. Chapters 27-31 offer a new approach to statistics that may have special interest for

10

Introduction

students w h o are scared by statistics. This m e t h o d substitutes " M o n t e Carlo" experiments for mathematical analysis. I t also emphasizes the reason­ ing of statistics—the most i m p o r t a n t ideas i n statistics, w h i c h are usually learned only informally. This m e t h o d has n o w been shown experimentally to be unusually effective (Simon, Shevokas, and A t k i n s o n ) .

4. Some G e n e r a l R e m a r k s A book about h o w to do research inevitably makes research sound difficult and treacherous. A n d this book gives special attention to the obstacles to knowledge that one faces i n d o i n g research, so as to help y o u recognize and overcome them. But please don t let these obstacles cause y o u to lose heart. Sound and valuable research can be done even by ordinary undergraduates—general research that is w o r t h publishing, and applied research that is of value to organizations and decision makers. A l l research has flaws, b u t the flaws need not be so grave as to invalidate the research, even i f i t is course-work research conducted w i t h a small sample, no money, and a l i m i t e d budget of time. Furthermore, research can be enormously exciting and great fun. T o find out something about the w o r l d that no one has ever k n o w n before is a rare t h r i l l . A n d i t is a t h r i l l that almost anyone can experience w h o w i l l start w i t h a sensible idea, w o r k enthusiastically and hard, and proceed w i t h caution. T h o u g h research can be great fun, I do not like to t h i n k of i t as a game. Rather, I prefer to remember that sound research can make a valuable social contribution, i m p r o v i n g the lives of individuals and communities and en­ r i c h i n g our culture. T h i n k i n g of research as a game can lead the researcher to focus only on the professional acceptance of one's work and its influence on one's career, rather than on the social and intellectual benefits of the research. O f course we all w a n t to get ahead i n the w o r l d . But i f we w a n t only to get ahead, i f our eye is on only the m a i n chance professionally, then we w i l l all be losers i n the long r u n . This book is a textbook. T h o u g h some of the ideas i n i t are new, most are not. L i k e other textbooks i t constitutes a sort of folk wisdom; the folk are the teachers, colleagues, and students w h o have discussed research w i t h me. Some of this w i s d o m seems never to have been collected or transcribed from the oral tradition. ( F o r example, the phrase ceteris paribus must have a very h i g h spoken frequency among social scientists, yet i t does not appear i n the index of a single one of the most popular books about research methods i n the social sciences.) T o collect and discuss this wisdom is the aim of this book.

part one

the process • f social-science researcli

2 t h e language definitions a n d validitg 1. 2. 3. 4. 5. 6. 7.

Operational Definitions Numbers and Operational Definitions Validity and Reliability Writing Scientific Reports Communicating with Yourself Honesty in Scientific Reporting Summary

1. O p e r a t i o n a l Definitions A stranger d r i v i n g a car w i t h CaHfornia hcense plates stops you on the street i n Urbana, Illinois, and asks "Where is N e w York?" You, being a lyrical poet w i t h a feeling for the spiritual side of life, tell h i m , " N e w York is i n the land of the sea, the singers of mercantile songs, and rotten paintings. W h y go there?" The stranger looks annoyed and says, "Come on, buddy, quit the non­ sense and t e l l me where N e w York is." " O n the East Coast, north of Washington, south of Boston," y o u answer. H e looks bewildered, so you add: "Latitude 40° 40' N o r t h , longitude 73° 48' West." H e looks outraged. " W i l l you or won't you tell me w h a t road to take?" he says. " H o w do I get there?" W h a t the m a n really wants is for y o u to say: "Continue d o w n this street to the first stop sign, about a quarter of a mile. T u r n r i g h t onto H i g h w a y 74. . . ." and so on. N o w y o u have satisfied h i m . You have p r o v i d e d instructions that he can follow. The instructions are unambiguous ( y o u h o p e ) , and, i f he follows your instructions exactly, he must arrive exactly where y o u and he agree he is t r y i n g to go—New York, i n this case. You have t o l d h i m w h a t he needs to know—nothing more and n o t h i n g less. You have not t o l d h i m to follow the road around curves, because he needs no instruction to do so. B u t you have

The Language of Research

13

t o l d h i m w h i c h w a y to t u r n w h e n he reaches the highway, because you have no reason to assume that he knows whether to go r i g h t or left at that point. Just as w i t h travel instructions, the language of empirical scientific re­ search is made u p of instructions that are descriptions of sets of actions or operations ( f o r instance, "turn r i g h t at the first street sign") that someone can follow accurately. Such instructions are called an "operational defini­ tion." A n operational definition should contain a specification of all opera­ tions necessary to achieve the same result w h e n repeated. I t need not specify the obvious operations, however. A n example of obvious and unnecessary i n ­ structions is f o u n d i n a child's address on a local letter: "U.S.A., E a r t h , W o r l d , Universe." T h e language of science also contains theoretical terms (better called "hypothetical terms"), for example, " u t i l i t y " i n economics and "reinforce­ ment" i n psychology. The place of theoretical terms is a h o t l y debated issue i n the philosophy of science, w h i c h w i l l be discussed i n Chapters 3 and 6. Each t e r m that actually enters into the empirical work, however, must be defined operationally. Another example: Assume that y o u are a psychologist and you wish to w r i t e to another psychologist to tell her^ w h a t color stimulus card y o u used i n an animal t r a i n i n g experiment, so that she can duplicate ("replicate") the experiment. The w o r d "green" alone m i g h t be dangerously imprecise. There arc many shades and intensities of green, and your correspondent m i g h t choose a color that w o u l d lead to different experimental results. You could ensure that she uses exactly the same color i f y o u send her the stimulus card y o u actually used. Your operational definition w o u l d then be "Use the enclosed color card." But i f i t is impractical to send out samples—as i t is w h e n y o u are w r i t i n g up the experiment for publication—this simple solution w i l l not work. You could improve the chances that another researcher w o u l d use the same color i f y o u compared your stimulus card to an interior decorator's color chart, then wrote d o w n the name and number of the m a t c h i n g color and the firm that puts out the color chart. C o m p a r i n g the stimulus card against the color chart is the operation that defines the color i n question. N o w some actual examples. A r m y Research Branch scientists i n W o r l d W a r I I w a n t e d to study "how personal adjustment varied i n the army." The key theoretical ( h y p o t h e t i c a l ) term, "personal adjustment," was operation­ ally defined this w a y : W i t h respect to verbal behavior, it is assumed that, on the average, men who said that they were in good spirits, that they were more useful in the Army than as civilians, that they were satisfied with their Army jobs and status, and that in general they liked the Army, were better adjusted to the Army than men who were negative in several of these expressions. (Stouffer, et al, I , 83) 1. The pronouns "she" and "he," as noted earlier, will be used interchangeably and more-or-less equally in frequency to refer to the generalized person when it would be too clumsy to use other expressions or circumlocutions.

14

The Process of Social-Science

Research

Notice the words "who said that they were i n good spirits. . . . " A question­ naire is an essential part of this particular operational definition. M e n w h o answered the questions i n one fashion are operationally defined as w e l l adjusted; others are i l l adjusted, by definition. Happiness is a concept related to personal adjustment b u t even more difficult to define satisfactorily. Happiness is perhaps as elusive a concept to define perfectly as i t is to achieve. N . B r a d b u r n tackles head-on the p r o b l e m of operational definition: "One way to find out whether people are happy is to ask them. Respondents were asked to answer this question: T a k i n g all things together, h o w w o u l d y o u say things are these days—would y o u say y o u are very happy, pretty happy, or not too h a p p y ? ' " ( p . 2 ) . N o w that operational definitions have been exemplified, a formal opera­ tional definition of "operational definition" may be appropriate. " A definition is an operational definition to the extent that the definer ( a ) specifies the procedure ( i n c l u d i n g materials used) for i d e n t i f y i n g or generating the definiendum and ( b ) finds h i g h reliability for [consistency i n application of] his definition" ( D o d d , i n Dictionary of Social Science, p. 476). A. Bachrach adds that "the operational definition of a dish . . . is its recipe" ( p . 7 4 ) . P. B r i d g m a n , the inventor of operational definitions, p u t i t that "the proper definition of a concept is not i n terms of its properties b u t i n terms of actual operations." H e pointed out that definitions i n terms of properties held physics back u n t i l Einstein and constituted the barrier that took Einstein to crack ( p p . 6 - 7 ) . Psychologists have thought more deeply about operational definitions than have other social scientists, because psychology tries to connect w h a t can be observed objectively to w h a t is unobservable inside people's minds. The foregoing definition of "personal adjustment" is an example of w h a t has been done. B y contrast, creating satisfactory operational definitions is seldom a p r o b l e m i n economics, for the theoretical concept i n economics often points clearly to the empirical variable that should be used. O f course even a concept like "money" is not automatically defined operationally; i n some cases the economist includes savings deposits along w i t h cash and checking accounts i n the operational definition of money, i n other cases not. A n d the concept "economic backwardness" bristles w i t h such definitional problems as whether a country's present income level or its rate of g r o w t h is the more appropriate proxy. Furthermore, p u t t i n g values on a country's output is a very tricky procedure that affects the relative backwardness of the country and makes one scholar ask "Is [backwardness] an operational term?" (Gerschenkron, p. 4 2 ) . B u t seldom is the relationship between theoretical and empirical variables as arguable i n economics as is the rela­ tionship between, say, the theoretical ( h y p o t h e t i c a l ) concept of love and its operational definition. Operational definitions are to be distinguished from property or attribute definitions, i n w h i c h something is defined b y saying w h a t i t consists of. For example, a crude attribute definition of a college m i g h t be " A n organization

The Language of Research

15

containing faculty and students, teaching a variety of subjects beyond the high-school level." A n operational definition of university m i g h t be " A n organization found i n The World Almanacs listing of 'Colleges and U n i ­ versities.' " Operational definitions also differ sharply from dictionary definitions. A dictionary definition of "green" is " l a : of the color green jade) b : having the color of g r o w i n g fresh grass or of the emerald lawns)" (Websters Third Netv International Dictionary [ u n a b r i d g e d ] , 1971, p. 996). The dic­ tionary definition often gives synonyms, w h i c h help you translate the w o r d you do not k n o w into words you already know, or gives examples of the w o r d that help you to learn h o w i t is used. B u t for many words the diction­ ary definition could not guide the actions of one person to correspond ex­ actly to w h a t someone else wanted. A dictionary definition of " N e w York" w o u l d not be at all helpful to your friend w h o wants to drive there from Urbana. A dictionary definition of "apple strudel" w i l l not guide your grandmother to make exactly the dish that you enjoy. A dictionary definition of "green" w i l l not guide another experimenter to use exactly the same color you d i d . The dictionary definition of "consumer price index" w i l l not t e l l y o u h o w to reproduce the consumer index used by the U n i t e d States. The subtleties of these and other nonoperational definitions are discussed later w h e n we define "causality." There have been heated philosophic arguments about operational defini­ tions and their place i n science.^ The argument is b r o u g h t out b y e. e. cummings ( p. 190 ) . ( While you and i have lips and voices which are kissing and to sing with who cares if some oneeyed son of a bitch invents an instrument to measure Spring with? I n these lines, the poet catches the essence of empirical science, even though he denounces its effects. W i t h ingenious empirical research designs and instruments, meteorologists actually do measure spring. B u t notice that the researcher must specify w h i c h readings on the instruments he w i l l con­ sider to denote spring. T h a t specification is his operational definition. Another side of the matter is given b y W . Kruskal, a statistician t u r n e d rhymemaker: "You'll care if some four-eyed bastard invents a better forceps for the resulting infant, or even if, by measuring Spring, the slob eventually figures out how to prolong it." 2. What has been said so far does not reflect an allegiance to the philosophic position called "operationism." The substitute label "working definition" suggested by Selltiz, et al. (p. 43), avoids many unnecessary associations of the term "operational définition." But I shall stick to the common usage.

16

The Process of Social-Science

Research

Numbers and mathematics can be an aid to clarity. B u t unwise use of mathematical symbolism can make matters more confusing: Contrary to common belief it is sometimes easier to talk in mathematics than to talk in English; this is the reason why many scientific papers contain more mathematics than is either necessary or desirable. Contrary to common belief it is also often less precise to do so. For mathematical symbols have a tendency to conceal the physical meaning that they are intended to represent; they some­ times serve as a substitute for the arduous task of deciding what is and what is not relevant; . . . I t is true that mathematics cannot lie. But it can mislead. However, the dangers of over-indulgence in formula spinning are avoided if mathematics is treated, wherever possible, as a language into which thoughts may only he translated after they have first been [clearly] expressed in the lan­ guage of words. The use of mathematics in this way is indeed disciplinary, helpful, and sometimes indispensable. (Kapp quoted by Georgescu-Roegen, p.i)

Cummings w o u l d have been disturbed i f he h a d k n o w n that scientists also measure love—sometimes w i t h the very kisses that he mentioned. "Love" can mean many things, of course, and the operations for measuring any facet of i t are far from obvious. Consider, for example, measuring the amount of love between mother and c h i l d by the number of times the mother kisses the child. I n this case, we certainly have established an opera­ tional definition of something w h e n we specify counting the number of kisses. B u t few of us w o u l d believe that the number of kisses is a perfect and true index of the amount of "love"; i n fact, there may be cases i n w h i c h number of kisses and amount of love may be inversely related. B u t for some purposes i t m i g h t be a useful index. Dispute has arisen because some scientists have taken a position i m p l y i n g that love is the kiss count and that the w o r d "love" can have no meaning apart from the description of an operation. W h e t h e r or not this general position is wise, a less extreme position is possible; that is, that hypothetical ( o r theoretical) concepts can certainly be useful i n science ( a n d i n common language) even though they are not operationally defined. T h e very fact that one can have i n m i n d a concept that is not yet defined operationally b u t t o w a r d w h i c h one reaches w i t h an operational definition is proof that con­ cepts can have meaning even though they are not operationally defined. For example, y o u and I probably can have some meaningful conversation about happiness even though i t is a many-sided and imprecise concept to us b o t h . A n d , w h e n i n t r o d u c i n g an operational definition of happiness, I may talk about the concept i n general terms. T h a t general discussion is a "hypotheti­ cal" concept of happiness or, as B. U n d e r w o o d calls i t , a "literary definition" ( p p . 54-55).3 W h e n y o u come to execute a piece of empirical research, all the concepts 3. A concept may also form part of a deductive theory, even though it is not operationally defined. In that case we call it a "theoretical concept" or an "unobservable."

The Language of Research

17

in the research and all the i m p o r t a n t words used i n the write-up must be defined i n terms of operations. A b o u t this point there should be no argu­ ment: I f , for example, y o u are to study love empirically, there must be something that y o u can count or measure that w i l l stand for love; otherwise you simply cannot do any empirical research. W h a t is the relationship between the operationally-defined kiss count and love? Is there a logical relationship between the two? Does "love" mean something apart from a kiss count? Is "happiness" only the answer to a question on a questionnaire? Troublesome disagreement can be avoided b y saying that the kiss count stands for love i n at least a partial w a y and that the questionnaire answer is a p a r t i a l proxy for happiness. W e need not assert that the kiss count is love or that i t is all there is to the concept of love. W e can agree that there may be other measures that stand for other aspects of love, w h i l e still m a k i n g good use of the operational definition. "Basically, an operational definition asserts only that a phenomenon has been reliably measured" ( U n d e r w o o d , p. 6 2 ) . Whenever y o u complete the sentence " I measured i t b y . . . " referring to a variable, y o u have pro­ v i d e d an operational definition of the variable. The relationship between the operationally-defined kiss count concept and the hypothetical concept "love" can never be p i n n e d d o w n logically. Rather, the relationship is one of good judgment and scientific artistry. A wise scientist develops operationally-defined concepts that are good "proxies" ( t h a t is, that stand for the hypothetical concept). B u t a proxy can never be perfect and complete; i t cannot represent all aspects of the hypo­ thetical term. Repeatability ( r e p l i c a b i l i t y ) is a major characteristic of scientific research. A n d operational definitions help make i t possible for other researchers to repeat exactly w h a t one researcher has done. This is the key property of operational definitions.

2. N u m b e r s a n d O p e r a t i o n a l Definitions N o w let us talk about numbers. Such a suggestion has the unfortunate effect of frightening some of y o u nearly out of your wits. Relax; this discussion w i l l not trouble y o u . " H o w far is i t to N e w York?" the traveling m a n asks you next. H e is again annoyed w h e n y o u answer, "a long way," because w h a t is a long way to y o u m i g h t be a short w a y to h i m or vice versa. B u t just p u t t i n g the distance into numbers is not enough. H e w i l l still be confused i f y o u tell h i m "three short ways." You m i g h t do a little better i f y o u t o l d h i m i t was 3,382,000 cubits to N e w York—provided he has studied his Bible carefully. The Bible, however, is not clear about the length of a cubit, so describing the distance i n cubits m i g h t still be confusing. A n d i f the traveling m a n has never read the Bible he has even less information than i f y o u said, "a long way."

18

The Process of Social-Science

Research

" A b o u t 950 miles, or 860 miles on the o l d road i f y o u w a n t to go that way," is a very satisfactory answer. The traveling man w i l l k n o w w h a t y o u mean, and he w i l l be able to figure h o w m u c h t i m e and h o w m u c h gas i t w i l l take h i m to get there. B e h i n d every numerical measurement lies an operational definition. A very precise interpretation of "three feet long" is "put three foot rulers end to end, and the distance i n question is from one end of the foot rulers to the other." M a n y years ago, the operational definition of "three feet long" was cruder. I t was "Put the heel of your left foot on a mark on the ground, p u t your r i g h t foot directly i n front of your left foot, and then your left foot i n front of your r i g h t foot. T h e n compare the t h i n g y o u are measuring against the distance between the mark on the g r o u n d and the toe of your left foot." N o w an example. Assume y o u must determine w h a t p r o p o r t i o n of the children i n L i n c o l n C o u n t y live on farms. H o w should y o u go about it? Your first impulse is to answer "Go and look at all the farms." Or i f y o u are slightly wiser y o u m i g h t say, "Look at a sample of the farms." B u t those are unsatisfactory answers, as we shall see. For convenience i n discussion let us narrow the p r o b l e m to determining how many children live on just a single farm, say the o l d Carey place. You send a girl to look at i t , and she comes back w i t h the answer, " t w o children." But t h e n y o u wonder—did she count all the c h i l d r e n and only the children? You had never told her w h o m she should consider a child, that is, at w h a t age a c h i l d stops being a child, and y o u have never t o l d her whether to count infants as children. N o w you must make those decisions. You send her out again, and she returns and says, " W e l l , there are t w o children p l a y i n g on the road near the house b u t not on the farm l a n d itself." Should y o u count t h e m or not? You h a d never t o l d her w h o m she should consider a c h i l d that is, at w h a t such data collectors), a rule for deciding whether children five or do not live on a farm, i n order to cover children visiting on the farm or visiting away from the farm. T h e n y o u must instruct her about whether a particular place is to be considered a farm (does the l a n d have to be farmed i n order for the place to be a farm?) and so on. You must give her instructions that w i l l cover almost every possible situation i n w h i c h she may have doubt, plus the important general instruction to ask you for further instructions about any situation that the rules do not cover. Notice that each of your instructions tells her how to perform an operation. "Ask 'do y o u live here?' " is an instruc­ t i o n to perform the operation of asking. The operation to determine whether a farai is being farmed is not obvious; for example, does a small patch of vegetable garden make i t a farm? Other language need not always be as precise as the language of science. T h e glory of poetry and religious language often is that many meanings are possible, and the reader may find the meaning that satisfies one's o w n heart. L e g a l decisions are often purposely ambiguous; sometimes a judge writes a

The Language of Research

19

decision that w i l l not b i n d future generations or reduce their flexibility, using "words that t r a i l rainbows b u t disguise his meaning," i n Ohver W e n d e l l Holmes' lovely description (Rosenfeld). I t is easier to be vague than to be specific or numerical, as y o u k n o w i f you have ever t r i e d to evade a difficult examination question. Vague and empty language, w h i c h J. Barzun calls "hokum," may go undetected else­ where b u t i t is fatal i n research. Hokum is the counterfeit of true intellectual currency. I t is words without meaning, verbal filler, artificial apples of knowledge. . . . Words should point to things, seen or unseen. But they can also be used to wrap up emptiness of heart and lack of thought. (Barzun, p. 25) J. Locke long ago diagnosed the difficulty: . . . [ H ] e that shall well consider the errors and obscurity, the mistakes and confusion, that are spread in the world by an i l l use of words, will find some reason to doubt whether language, as it has been employed, has contributed to the improvement or hindrance of knowledge amongst mankind. . . . This, I think I may at least say, that we should have a great many fewer disputes in the world, if words were taken for what they are, the signs of our ideas only, and not for things themselves. ("Essay on the Human Understanding," Part I I I , Chap­ ter 10) As a test of precision, state your research procedure as a w r i t t e n set of instructions to a layman. I f the average person can follow your instructions and then r e t u r n to y o u w i t h the correct data y o u w a n t to gather, then y o u have stated your research design w e l l . A class exercise dramatizes h o w difficult i t is to w r i t e even the simplest set of instructions i n unambiguous language. Students w o r k i n pairs. Each stu­ dent leaves the classroom for five minutes and counts any set of objects i n the rest of the building—perhaps the number of chandeliers on the first floor, the number of doors to the b u i l d i n g , the number of stairs i n the n o r t h staircase, or anything of w h i c h there is more than one. T h e n each student writes a set of instructions for his partner that he hopes w i l l lead the partner to count exactly w h a t he d i d and to arrive at the same answer. The partners then exchange instructions and carry them out. T h e results are often funny. One person counts five statues on the first floor; the partner counts none. W h y ? T h e instructions said "black statues," and the partner d i d not think the statues were exactly black. One person counts fifteen chandeliers on, the first floor; the partner counts none. W h y ? W h a t one partner called the first floor the other called the basement. One person counts eight staircases, and the partner counts t w o staircases. W h y ? One person counts the stairs between each floor as a staircase; the other counts the stairs from top floor to b o t t o m floor as a staircase. A n d so i t goes, showing h o w difficult i t is to w r i t e any set of instructions, that is, an opera­ tional definition, that w i l l really be satisfactory.

20

The Process of Social-Science

Research

T h e language of empirical science has m u c h similarity to the language that a witness is p e r m i t t e d to use i n court: no hearsay, no speculation, w h a t the witness himself observed and only w h a t he observed. The empirical researcher must transform the vague, the unspecified, the abstract, into the specified and concrete, even though precision is h a r d w o r k and all of us are lazy. T r o u b l e w i t h scientific language also arises because of w r o n g notions about the relationship of words and things. Most people assume that, i f there is a w o r d , there is also a t h i n g that corresponds to the w o r d . I f that were so, finding the r i g h t w o r d w o u l d be a matter of searching t h r o u g h dictionaries and rule books u n t i l you found the right w o r d for any t h i n g or vice versa. This attitude is fostered b y our laws and by the rules that parents must make for children as they are growdng up. The state and city ordi­ nances tell you, as they must, exactly w h a t "speeding" is. A n d our i m m i g r a ­ t i o n laws are precise about w h a t makes a person "American." But, i f y o u are doing a research study on speeding, the legal definition w i l l probably not help you. Instead, y o u w i l l have to make your o w n definition, a definition that is operational, so that y o u can w o r k w i t h i t and communicate i t to others. A n d the definition must fit your particular research needs. A n d , i f you are studying Mexican musicians, you w i l l have to decide whether a pianist b o r n and brought up i n Russia b u t w h o now holds Mexican citizen­ ship is a Mexican musician. The legal definition of "Mexican" w i l l not suffice; nor can you use an operational definition that someone used i n a study of Mexican exports. T h e crucial idea is that you must create your own definitions—no one can do i t for you. A n d the words do not have meaning—at least for the purpose of your study—until you give them meaning. O n the other hand, one must try to use words i n a manner that other people can understand. I t is seldom sensible to coin words or to use familiar words i n unfamiUar ways, and whenever possible you should use exactly the same operational definitions as previous writers i n the field so that your w o r k may be compared w i t h theirs. I n applied research for clients, one often has trouble getting the client to be specific about w h a t he wants to find out. I f the client says, " F i n d out h o w to improve our advertising operation," the researcher cannot answer h i m as a researcher. ( H e may act i n the role of consultant, b u t that is not research.) The researcher must find out whether the client wants to k n o w the o p t i m u m amount of advertising or the best type of advertising, or what. The same is true i n applied research for government agencies. I t should be a policy decision whether the C i t y of Los Angeles orders research done on h o w badly ofE is a m i n o r i t y group i n housing, employment, or both, though the researcher may advise on w h i c h of the research projects w o u l d be most f r u i t f u l a n d feasible. Chapter 32 contains further discussion of definitions and takes u p the p r o b l e m of defining the i m p o r t a n t t e r m "causal."

The Language of Research

21

3. V a l i d i t y a n d R e l i a b i l i t y Every piece of research aims to produce an answer to a scientific question. A n d i t is reasonable to ask just how good an answer the research provides. This section is about concepts that may be used i n j u d g i n g h o w good the answer is. V a l i d i t y is the overall concept used to refer to h o w good an answer the study yields. I f the answer given b y the research is likely to be sound, the re­ search is said to be vaHd. The concept of v a l i d i t y may be applied to the investigation as a whole, or i t may be applied to one or another aspect of the study. The rest of this section and m u c h discussion throughout the book should help y o u to judge whether a study is v a l i d and its conclusions sound. Reliability is one of the constituent elements of validity. Reliability is the extent of r a n d o m variation i n the results of the study. W i l d l y unreliable results cause a study to be invalid. For example, i f an intelligence test sometimes yields a h i g h L Q . score and sometimes a l o w I . Q . score for the same child, the results of the test are unreliable, and any study that employs the test is i n v a l i d . A subconcept of v a l i d i t y that applies to the research as a whole is con­ struct validity or external validity. No matter h o w accurate and error-free the research, the study may be quite inapplicable to the original question. For example, American I . Q . tests may be i n v a l i d for measuring the intellec­ t u a l capacity of children i n other cultures. Or a sample of prison convicts is probably i n v a l i d for study of the sex behavior of all Americans. A study may also be i n v a l i d because its internal procedures are invalid. For example, a study of the effects of early stair-cHmbing t r a i n i n g on a group of infants is i n v a l i d unless the results are compared w i t h those from a group of infants w h o received no training. The results of such a "control group" w i l l indeed reveal that the t r a i n i n g has no long-run effect; the un­ trained group soon catches u p w i t h the trained group. A n important cause of overall unreliability—and therefore a cause of invaHdity—is a too-small sample. For example, a sample of ten people, no matter h o w representative, w i l l give unreliable results for a Presidential election poll, because different samples of size ten w i l l give different answers on such a close issue. T h e constituent elements of a study may also be described as " v a l i d " or " i n v a l i d , " "reliable" or "unreliable." L e t us concentrate on operational defini­ tions and measuring devices. A specified set of conditions, that is, a defini­ t i o n or a measuring device, is reliable i f the same i n p u t always leads to the same output. Reliability is roughly the same as consistency or repeatability. The concept applies to either operational definitions or to measuring de­ vices. A good operational definition is very reHable; that is, a p p l y i n g the defini­ t i o n produces the same result every time. A n d a w e i g h i n g scale is reliable i f i t shows exactly the same reading every time a given iron bar is placed upon

22

The Process of Social-Science

Research

it. A n I . Q . test is reliable i f people get the same score w h e n they are tested twice. Good operational definitions of "children," "live," and "farm" w i l l lead any person to count the same number of children at a sample of d w e l l i n g places. I t is easy enough to check the reliability of a merchant's scale w i t h re­ peated weighings of the same iron bar. B u t checking the reliability of an I.Q. test is more difficult. For example, most people do better the second time they take a given test. One of many ways to check the reliability of the I.Q. test is to split the test into t w o halves and examine whether the scores on the halves are similar. A n operational definition or measurement can be very reliable b u t still be worthless. For example, i f the w e i g h i n g scale always reads 10 percent higher than a fair scale, the readings are no good. A n d i f an I . Q . test has excellent split-half rehability b u t does not give higher scores to the students w h o later do better i n school, the test is not useful. Another example: A thoroughly reliable operational definition for "love" is " A d d the weights of a boy and a girl. I f their combined weights total over 322 pounds, they w i l l be said to be in love." B u t this measurement does not correspond at all to the hypothetical concept of love y o u are t r y i n g to measure, and therefore i t has no meaning. A l l these worthless operational definitions are said to "lack validity." A definition ( o r a classification or measurement) is vaHd i f i t really classifies or measures w h a t you want i t to classify or measure. A n I . Q . test is v a l i d i f i t really does measure the future school success of students, because that is w h a t i t is intended to measure. A scale weighs validly i f its weights agree w i t h the commonly accepted weights of the Bureau of Standards i n Wash­ ington. A pre-election Presidential p o l l is v a l i d i f i t accurately picks the election winner. A count of the gold bars i n Fort Knox is valid i f i t reveals how many bars "really" are there. ( T h e best answer we can get m i g h t be off by $20 m i l l i o n i n either direction, according to O. Morgenstern.) A n opera­ tional definition of love is v a l i d i f all those people w h o fit i t seem to the experts i n the field to be i n love. There is no simple rule for deciding whether an operational definition is valid. Rather, the decision calls upon your judgment and scientific wisdom. Often a good test is whether y o u can persuade other scientists that the definition is valid. A n d of course the v a l i d i t y must depend upon the purpose of the study. For example, i t is easy to make up a very reliable operational definition of "the faculty of the University of Illinois." You could use any of these definitions: the full professors Hsted i n the staff directory; all assistant, associate, and f u l l professors i n the directory; all members of the faculty club; all buildings-and-grounds employees listed i n the directory; or any other easily defined group. B u t i f your purpose is to survey "faculty" opinion about an important campus issue, i t is not so clear w h i c h definition to use. Should y o u include teaching assistants? research associates? retired professors? and so on. A bad choice w i l l give you an invafid definition. But w h a t is the criterion of vahdity? The most v a l i d definition i n this case

The Language of Research

23

probably includes those people whom your listener or reader will have in mind when you tell him what the faculty opinion is. Another example of h o w v a l i d i t y depends upon purpose: I n Chapter 32 we shall see h o w difficult i t is to construct a v a l i d operational definition of "cause and effect." Furthermore, a definition of cause and effect that is vaHd for a decision maker is not necessarily v a l i d for a scientist. Often there are several possible ways to validate an operational definition, and the v a l i d a t i n g method is best that gets "closest" to w h a t the definition is intended to define or measure. One can think of a hierarchy among various validating methods. For example, no one w o u l d disagree that the ultimate validation of an I . Q . test is that i t predicts w e l l w h i c h students do w e l l i n school and w h i c h students do poorly. B u t another approach to v a l i d a t i n g an L Q . test is to compare its results to those of well-established I.Q. tests. T h e latter validation is less powerful than the school-success vahdation, for i t is less direct and therefore more subject to flaw; the well-established tests themselves may not be very valid. One method may be said to provide stronger validation than another m e t h o d i n a given situation; i n another situation the relative validation power of the same t w o methods m i g h t be the reverse. Chapter 6 discusses the specifics of the validation process. 4. W r i t i n g Scientific Reports The "mere" w r i t i n g up of results may strike y o u as a minor part of the research process. B u t do not skip this section; the subject is not t r i v i a l at all. The w r i t t e n description of research is part of the very w a r p and woof of the research itself. For example, i n some kinds of research-especially experi­ mental and survey tests of sharply stated hypotheses-you can often w r i t e up the report o f the research before the data are collected, except for the actual results. This exercise helps y o u to think through a study and often uncovers difficulties i n theory and design that w o u l d otherwise appear later and cause trouble. I f y o u w r i t e up the report early and i f y o u w r i t e i t up well, y o u have done much of the research job. Research reports, and research proposals too, must follow the specifica­ tions that w e have set forth for scientific language. Scientific communication must be objective, rather than subjective, i n b o t h the words and the con¬ cepts used. "Subjective" here means the thoughts that are inside one per¬ son's head and that are unavailable for checking by other people. "Objec­ tive" here means those statements that are public and checkable. ("Public knowledge" and "private knowledge" may be better terms respectively than "objective" and "subjective.") I w i l l not give y o u a set of step-by-step instructions on h o w to w r i t e u p a research project. Such a set of procedures m i g h t force all your write-ups into that format whether or not i t fits the needs of a particular project. A n d no format w i l l fit all research. W h e n preparing to w r i t e up a piece of research.

24

The Process of Social-Science

Research

read well-done pieces of research i n your field, and choose as models one or more that resemble your project. Notice that no one format is used b y all writers, even w i t h i n a single restricted journal i n a branch of psychology, say. There may be some very general similarities of format, b u t the organi­ zation of each report is tailored to the particular needs of each research project. A n i m p o r t a n t issue i n many write-ups is h o w m u c h to generalize from the data or how m u c h to qualify the conclusions. Some researchers are supercautious about d r a w i n g inferences about their w o r k or about hazarding the inferences i n print. For example, to report a finding that men have more dreams of violence than w o m e n do b u t to claim that, because the sample all came from Illinois the results cannot therefore be generalized beyond Illinois, is ridiculous. Or, a biologist w h o finds that cigarette tars cause cancer i n rats may refuse to suggest that cigarette tars may w e l l have the same effect i n humans. By so d o i n g y o u may protect yourself from criticism, but y o u may also lessen the importance of your w o r k b y failing to make i t relevant to researchers w h o are interested i n cancer i n humans. ( F u r t h e r ­ more, there is an unattractive hypocrisy i n such overcautiousness. W e all know damn w e l l that the only reason one studies the effect of cigarette smoking i n rats is because of the possible impHcations for human beings.) Later, and particularly i n the section on experiments, there is further discus­ sion of w h e n i t is and w h e n i t is not reasonable to d r a w general inferences from samples. O n the other hand, the researcher also has an obligation to keep the reader from j u m p i n g the rails and d r a w i n g unfounded inferences. A . Kinsey, et al., devoted much of their long explanation of method to limitations of data and method. B u t the officially appointed statisticians w h o reviewed the w o r k (Cochran, et al.,) gently chided them for not repeating the cautions regularly i n the rest of the book so that the casual reader w o u l d not be misled. Sometimes w h e n y o u have a great many data y o u must decide h o w m u c h to include i n your report. E r r on the side of r e p o r t i n g too m u c h rather than too little, especially i n appendixes. You can always cut out the excess. A w o r d about style i n scientific w r i t i n g . One of the purposes of the found­ i n g of the great Royal Society of L o n d o n i n 1660 was to encourage writers on scientific matters not to w r i t e special-pleading polemics f u l l of loaded adjectives, b u t rather to use unemotional and unbiased language. This pur­ pose is indeed admirable, b u t i t has had some unintended and unfortunate consequences. For example, many scientists n o w refuse to w r i t e i n the first person, w h i c h means that they must often forgo the active voice i n their sentences. The passive voice often leads to tortured and ludicrous sentences that resemble a person s contortions w h e n he tries to hide his face from a spotlight; the movement is not graceful. Furthermore, i f a w r i t e r is artificially restricted from using such linguistic constructions as " I , " there are fewer options and fewer tools available, and precision must diminish. Scientific caution and the desire to avoid criticism also make scientific

The Language of Research

25

w r i t i n g more complicated and sap its vigor and pleasantness. But, w h e n w e must make a choice, i t is unhappily true that vigor and stylistic grace are the less important of the virtues. 5. C o m m u n i c a t i n g w i t h Yourself Being precise, specific, and concrete i n communicating with ijourself is also important. T o fail to make detailed plans is u n w i t t i n g l y to make important decisions by postponing or i g n o r i n g them. N o t deciding may mean not doing, and w h a t is not done n o w may not be possible at a later stage. Specify as m u c h as you can i n advance. D r a w up the tables for w h i c h y o u wish to collect data; y o u w i l l be surprised how hard i t is to do this, a sure tip-off that your t h i n k i n g is too vague. W h e n possible, i t is wise to w r i t e up the research report prior to collecting data as an aid to good design plan­ ning. I cannot overemphasize the importance of doing these and similar exercises earhj i n your research w o r k . One's self-communication is i m p r o v e d b y subjecting oneself to the same discipline and r e q u i r i n g the same precision as for communication w i t h someone else. Make liberal use of pencil and paper. W r i t e d o w n w h a t y o u t h i n k you are t h i n k i n g . You w i l l constantly be surprised at h o w this method helps to clarify your t h i n k i n g . . . . [V]ery often, a problem seemed settled, everything fixed and clear, till I began to write down a short preliminary sketch of my results. And only then, did I see the enormous deficiencies, which would show me where lay new prob­ lems, and lead me on to new work. In fact, I spent a few months between my first and second expeditions, and over a year between that and the subsequent one, in going over all my material, and making parts of it almost ready for pubHcation each time, though each time I knew I would have to rewrite it. . . . I have written up an outline of the Kula institution at least half a dozen times while in the field and in the intervals between my expeditions. Each time, new problems and difficulties presented themselves. (Malinowski, p. 13) I t also helps to ask and answer such questions as " W h a t do I really w a n t to know?" and " W h a t am I really t r y i n g to find out?" Returning to these fundamentals w h e n you get stuck often clears up confusions Hke magic. Your jottings i n the course of the w o r k serve as a better record of your research than your memory does. For example, w h e n you make a decision to exclude or include a subject or d a t u m i n the sample, w r i t e d o w n w h y . Later you w i l l need this note w h e n you w r i t e up the research. W r i t e d o w n all the other observations of what you see, and note w h a t you d i d and w h y you d i d it. These notes are like the laboratory notebook that is of such crucial impor­ tance to natural scientists. 6. H o n e s t y in Scientific R e p o r t i n g Scientific research is a h u m a n enterprise. A n d the i n d i v i d u a l researcher often has a stake i n obtaining some particular outcome from the research—

26

The Process of Social-Science

Research

because the particular outcome w i l l confirm his theory, or w i l l be consistent w i t h her ideological biases, or w i l l be sensational and make h i m famous, or whatever. I n this as i n all other situations where there are interests at stake, there is pressure to produce the desired outcome, whether or not i t can be f o u n d i n the data. Some scientific dishonesty^ is flagrant, as i n the example of C y r i l Burt, the leading British psychologist of his time. His findings apparently showed that the intelligence of blacks, women, and lower-class Englishmen is lower than that of whites, men, and the British m i d d l e class. These findings have had a great influence on psychologists, on the lay public, and on social policy. A few years after Burt's death i n 1971, unlikely coincidences were found i n the data that led to the discovery of many other discrepancies, and to the j u d g m e n t that his findings are scientifically worthless. Burt's motive appar­ ently was that he wished to support his personal bcHefs about the intelli­ gence of various groups. Another famous example is that of Pfltdown man, the supposed remains of a prehistoric human that was concocted of modified modern remains i n 1908 and not exposed u n t i l 1953. Some other w e l l - k n o w n recent examples from outside the social sciences include the f o l l o w i n g : Item: A scientist at the Sloane-Kettering Institute painted dark patches on white mice to make his colleagues believe he had perfected a way to make skin grafts between non-twins. Item: The Food and Drug Administration charged a major pharmaceutical manufacturer, G. D . Searle & Company, with falsifying the scientific data upon which claims of the safety of two drugs and an artificial sweetener were based. Searle's research methods, according to an F.D.A. report, were so careless that reliable scientific conclusions could not be derived from them. Item: A promising student at Harvard University reported experiments show­ ing that something in the blood of one animal can be injected into another, trans­ ferring immunity to certain foreign substances. No other research group was able to reproduce the striking results. The line of research was abandoned amid pub­ licly voiced suspicions that the test animals had been tampered with. Item: A Pennsylvania State University chemist said that he had evidence that the sex scents of insects vary according to what the bugs eat. I f true, his "find­ ing" would destroy a major new avenue of research on safe pest controls. His university touted the results loudly. However, the chemist's co-workers examined the same data and repeated the experiments and found no evidence for the claim. The chemist said that he still believed in his theory and would repeat his experiments. (Rensberger, p. 1) You are not likely to be t e m p t e d to c o m m i t such out-and-out major frauds. Your temptation may come w h e n your data show a reasonably con4. The following paragraphs are based on Rensberger.

The Language of Research

27

sistent picture, i n accord w i t h the outcome you desire, and then some frag­ ment of conflicting evidence crops up—perhaps an experiment w i t h a few rats of a different strain, or a sample of data from a census prior to the one you have been w o r k i n g on, or an analysis using a different mathematical form that y o u have assumed to be less appropriate than the line of analysis y o u have chosen. Or the conflicting evidence may be results contained i n a forgotten article i n an obscure journal b y an u n k n o w n author that you just happen to come across. T h e n the temptation is simply to convene a short conference w i t h your­ self, i n w h i c h all sides of your m i n d arrive at a consensus that i t w o u l d be scientifically appropriate to ignore the additional and conflicting evidence on the grounds that i t really is not relevant. I doubt that any practicing empirical researcher is so saintly that such a thought has never crossed his or her m i n d . M y informal i n t e r v i e w i n g reveals that nine out of ten research­ ers a d m i t i t , and the tenth is a damned liar. Even Gregor Mendel, the founder of modern genetics, is n o w thought not just to have contemplated touching up the data, b u t actually, to have falsified them to make them fit his theory—though the theory was i n fact correct. T h e prevalence of the p r o b l e m is suggested by this observation: Last spring a graduate student at Iowa State University required data of a particular kind in order to carry out a study for his master's thesis. I n order to obtain these data he wrote to 37 authors whose journal articles appeared in APA journals between 1959 and 1961. Of these authors, 32 repHed. Twenty-two of these reported the data misplaced, lost, or inadvertently destroyed. Two of the remaining 11 offered their data on the conditions that they be notified of our intended use of their data, and stated that they have control of anything that we would publish involving these data. . . . We met the former condition but re­ fused the latter for those two authors since we felt the raw data from published research should be made public upon request when possible and economically feasible. Thus raw data from 9 authors were obtained. From these 9 authors, 11 analyses were obtained. Four of these were not analyzed by us since they were made available several months after our request. Of the remaining 7 studies, 3 involved gross errors. One involved an analysis of variance on trans­ formed data where the transformation was clearly inappropriate. Another analysis contained a gross computational error so that several F ratios near one were reported to be highly significant. The third analysis incorrectly reported insignificant results due to the use of an inappropriate error term. . . . (Wolins, p. 657) D o c t o r i n g the data can r u i n your reputation, and i t can cause y o u great suffering from pangs of conscience. O n the positive side, some of the world's great discoveries have come from researchers w h o took apparently conflict­ i n g data seriously and pursued the discrepancy, rather than sweeping i t under the rug, thereby leading to great new findings. H o w y o u conduct yourself about such personal matters as acknowledging and sharing credit w i t h people w h o help y o u or w o r k w i t h y o u is a related topic that I w i l l not presume to lecture y o u about.

28

The Process of Social-Science

7.

Research

Summary

Theoretical scientific discussions must be converted i n t o operationally-de­ fined terms w h e n e m p i r i c a l research is performed. One key test of an opera­ t i o n a l definition is whether readers and subsequent researchers w i l l k n o w exactly w h a t e m p i r i c a l operations y o u performed, so that these operations can be repeated precisely. This is the test of

reliahilitij.

A good operational definition must also refer closely to the

theoretical

concept y o u are interested i n ; a reliable b u t irrelevant operational definition is not valid

and hence worthless. A good operational definition also has l i t t l e

bias i n measurement. M o r e generally, an e m p i r i c a l study is v a l i d i f i t yields a reHable answer to the question to w h i c h i t is addressed, that is, i f the research purpose is w e l l met. G o o d scientific c o m m u n i c a t i o n claims neither too m u c h nor too l i t t l e for the

findings

of a study. Good scientific w r i t i n g is clear and vigorous, b u t

relatively objective and unemotional.

EXERCISES 1. A n o p e r a t i o n a l d e f i n i t i o n s h o u l d s u i t t h e p u r p o s e s of t h e p a r t i c u l a r s t u d y in w h i c h it is t o b e u s e d . B r i e f l y d e s c r i b e a r e s e a r c h p r o j e c t in w h i c h e a c h of t h e f o l l o w i n g t e r m s m i g h t b e u s e d , a n d t h e n c r e a t e a s a t i s f a c t o r y o p e r a ­ tional definition of the t e r m : " l e a r n i n g " ; "part-time j o b " ; "religious rite"; "aggressiveness"; "personal income"; "vacation"; "memory"; "war"; " b a r t e n d e r " ; "most popular m a n o n your college c a m p u s " ; " m o n e y " (be careful not to stop with just currency); "tribal loyalty"; "vocational educa­ tion"; "Negro." 2. W o r k o u t a n o p e r a t i o n a l d e f i n i t i o n f o r t h e c o n c e p t " U n i t e d S t a t e s r e s i d e n t " f o r u s e in a c e n s u s in t h i s c o u n t r y . D o t h e s a m e f o r a n u n d e r d e v e l o p e d A f r i c a n country. Pay special attention to w h e t h e r a person lives there o n t h e d a t e of t h e c e n s u s , a n d w h e t h e r t h a t p e r s o n o r d i n a r i l y l i v e s t h e r e . E x p l a i n t h e r e a s o n s f o r y o u r o p e r a t i o n a l d e f i n i t i o n s in t e r m s of t h e p u r ­ p o s e s of t h e c e n s u s e s . 3. F i n d f i v e e x a m p l e s o f g o o d o p e r a t i o n a l d e f i n i t i o n s in y o u r m a j o r f i e l d of i n t e r e s t . T h e n f i n d five e x a m p l e s o f p o o r o p e r a t i o n a l d e f i n i t i o n s . T e l l w h y they are g o o d or bad. 4. Y o u w a n t t o e x p l o r e t h e r e l a t i o n s h i p b e t w e e n t h e a m o u n t of a t t e n t i o n m o t h e r s give t o infants a n d t h e extent to w h i c h one-year-old babies love their mothers. Define " a t t e n t i o n " a n d " l o v e . " 5. Y o u w a n t t o t e s t t h e h y p o t h e s i s t h a t t h e i n t r o d u c t i o n of u n i v e r s a l l i t e r a c y into u n d e r d e v e l o p e d countries leads to rapid social and e c o n o m i c develop­ ment. Define " i n t r o d u c t i o n " and " d e v e l o p m e n t . " 6. R e a d t h e f o l l o w i n g q u o t a t i o n . H o w w o u l d y o u g o a b o u t c o n s t r u c t i n g o p e r a ­ t i o n a l d e f i n i t i o n s of t h e 15 m a j o r t y p e s of h e a d a c h e s ?

The Language of Research

29

Heady w i t h victory over gravity, tooth decay and the atom, laboratory science appears to be gearing up for an assault on mankind's oldest ailment—the headache. The first report from the battlefront by the A d Hoc Committee of the Classifica­ tion of Headache of the National Institute of Neurological Diseases and Blind­ ness lists 15 major types of headache, from migraines to cranial neuralgias. (Champaign-Urbana Courier, Januaiy 26, 1964, p. 33)

ADDITIONAL

READING

FOR

CHAPTER 2

O n o p e r a t i o n a l d e f i n i t i o n s , s e e U n d e r w o o d ( C h a p t e r 3 ) . A l s o s e e S e l l t i z et ah, rev. e d . , p p . 4 2 - 4 4 . On t h e w r i t i n g of research

reports, especially

in s o c i o l o g y ,

see

Whitney

( C h a p t e r 16) a n d S e l l t i z et ah, r e v . e d . ( C h a p t e r 1 5 ) ; f o r b u s i n e s s a n d e c o ­ nomics, see Berenson and Colton. Useful notes on research procedure a n d information about papers, libraries, a n d s o o n , m a y b e f o u n d in B a r t a n d F r a n k e l ( e s p e c i a l l y f o r s o c i o l o g i s t s ) . T o i m p r o v e y o u r p r o s e , s e e The Elements of Style by S t r u n k a n d W h i t e , a clear a n d pleasant s u m m a r y of the important rules for g o o d w r i t i n g .

3 basic concepts of r e s e a r c h 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Variables in General The Dependent Variable The Independent Variable Parameters The Functional Form Assumption, Theory, Deduction, Hypothesis, Fact, and Law Universe and Sample The Ideal Causal-Study Design Ceteris Paribus Summary

1. V a r i a b l e s i n G e n e r a l Variables and parameters are the final forms i n w h i c h y o u w o r k w i t h scien­ tific terms and concepts. Therefore, the words "variable" and "parameter"— along w i t h "function"—are the most common words i n most scientific dis­ cussions. A variable is not just "some quantity that varies." A variable is a quantity in which you are interested that varies i n the course of the research or that has different values for different samples i n your study. E v e r y t h i n g changes sooner or later. B u t a variable is a factor whose change or difference y o u study. Perhaps g i v i n g specific examples of "variables" w i l l help.^ A m o u n t of rain is p r o b a b l y a variable to a meteorologist studying precipitation b u t not to an astronomer w h o is studying stars—though the rain may be annoying to the astronomer because i t makes the stars invisible. Rain is not likely to be a variable to a biologist studying l u n g cancer. B u t rain might have something to do w i t h air pollution, w h i c h m i g h t have something to do w i t h l u n g cancer, i n w h i c h case rain w o u l d indeed become a variable to the biologist. Temperature and h u m i d i t y are probably not variables for a psychologist studying intelligence i n Ghicago school children, b u t they m i g h t w e l l be variables to an anthropologist or economist studying the cultural or economic development of p r i m i t i v e peoples. 1. Some social scientists use "relevant variable" as "variable" is used here. I think it is less confusing, however, to consider that irrelevant variables are not variables at all.

Basic Concepts of Research

31

The researcher chooses his variables, not the other w a y around; the w o r l d about the researcher does not tell h i m w h a t aspects of i t to study. For example, i t is entirely a decision of the researcher to study the effects of broken homes on juvenile delinquency rather t h a n the relationship between neurosis and juvenile delinquency. Both choices may be good ones, and the choice is not forced b y the nature of reality. O f course, the good researcher does not choose his variables at random or casually; rather, he chooses them w i t h extreme care, for such choices are among the crucial ones he must make. Poorly-chosen variables y i e l d useless results.

2. T h e D e p e n d e n t V a r i a b l e T h e dependent variable (actually there may be several dependent variables, b u t that is unusual) is that quantity or aspect of nature w^hose change or different states the researcher wants to understand or explain or predict.^ I n cause-and-effect investigations, the effect variable is the dependent variable. I f y o u wish to investigate whether there is any relationship between the mother s smoking cigarettes and the w e i g h t of the baby, then cigarette smoking is an independent variable.

3. T h e I n d e p e n d e n t V a r i a b l e The best definition of an independent variable is a variable whose effect upon the dependent variable you are trying to understand. There may be several independent variables. You may simultaneously investigate the effect of the mother's cigarette smoking, the mother's exercise, parents' weights, and other variables u p o n the w e i g h t of the baby. I n some types of research one cannot label the variables as dependent and independent. For example, a study of the distribution of babies b y w e i g h t at b i r t h has only the single variable of weight, as long as y o u do not introduce independent variables to explain w h y some babies w e i g h six pounds and others w e i g h seven pounds. A psychological-anthropological study of the types of personahties found among the Navahos, and the U.S. Census, are other examples of studies w i t h o u t independent variables.

4. P a r a m e t e r s The concept of parameter is tricky. A parameter is a q u a n t i t y that has some importance to a study b u t that remains unchanged i n the course of the study. This is w h y some people say frivolously that a parameter is a variable that does not vary. T o illustrate. Figure 3.1 shows the hypothetical relationship between the 2. The terms "explain" and "predict," as well as the related terms 'causality," "function," and "scientific law," are distinguished and defined in Chapter 32. For the time being, these words are in a crude sense corresponding to everyday usage.

32

The Process of Social-Science

Research

price of scallions and the quantity of scallions that people buy, produced i n a given year. FIGURE

3.1

Hypothetical Relationship Between the Price and Production of Scallions

Price of Scallions

Quantity of Scallions Sold

T h e crosses represent the quantities sold d u r i n g periods w h e n various prices prevailed. The straight line is d r a w n among the crosses as close to as many of t h e m as possible ( t h o u g h there are several ways to do i t ) . T h e assumption is that each of the crosses may be somewhat i n error and that the line represents a better guess than any given observation of h o w m u c h w o u l d be sold at a given price, because the line is based on the information contained i n all the observed crosses. One m i g h t draw curved lines of vari­ ous shapes also, depending on one's general knowledge of the scallion market, b u t a straight line is particularly easy to w o r k w i t h . I n algebraic terms, the formula for a straight line is y equals a plus bx. Or, i n the example at hand, q = a -\- bp, where q is the quantity of scallions sold and p the price of scallions and where a and b are measured from the diagram or estimated statistically. The algebraic constants a and b are parameters for the scallion market and are assumed to stay the same no matter w h a t the price of scallions is. W i t h the help of the parameters, y o u can predict h o w many scallions w i l l be sold for any price y o u pick. Of course the parameters may change. I n another country or another decade, more or fewer scallions m i g h t be sold at any given price than the line i n Figure 3.1 indicates. But, for the particular set of circumstances you are working with, a and b are assumed to be constant, and that is w h y pa­ rameters are called "constant variables." A parameter is a property of a whole universe^ and not only of the sample 3. See pages 126-127 for a discussion of the meaning of the term "population" or its synonym in research terminology, "universe."

Basic Concepts of Research

33

y o u observe. T h e t e r m "statistic" is apphed to the sample's counterpart of the parameter. For example, i f y o u p o l l a sample of Kansas voters and find that 56 percent expect to vote Democratic i n the next election, that figure is a statistic. O n the basis of the sample y o u m i g h t estimate that 56 percent of all voters expect to vote Democratic, b u t your estimate of the parameter based on the statistic could be i n error. The actual percentage of all voters w h o expect to vote Democratic is the parameter.

5. T h e F u n c t i o n a l F o r m The mathematical-logical concept of function is very different from the anthropological-sociological concept of the same name.^ I n the mathemati­ cal-logical sense w i t h w h i c h we are concerned here, to say that "y is a function of x" means only that the magnitude of y depends upon the magni­ tude of X. The statement implies that y is the dependent variable and x (or several x's) the independent variable ( s ) . Functional relationships are w r i t t e n i n algebraic notation, for example, y = f (xi, X2, Xs) i f there are three independent variables. This algebra is to be read "y is a function of X i , X2, and X3" or "t/ depends upon X i , X2, and X3" or "t/ is the dependent variable whose values (magnitudes) depend upon the values (magnitudes) of X i , X2, and X3." Another letter, perhaps g or h, could replace /, just as w or z m i g h t be used instead of x; the choice of letters is quite arbitrary. I n the scallion example, ( q u a n t i t y of scallions p r o d u c e d ) = f (price of scallions) is translated as "the quantity of scalHons produced depends u p o n (is a function of) the price of scallions." The causal direction of the functional relationship—that is, w h i c h variable is p u t on the left side as the dependent variable, and w h i c h is p u t on the r i g h t as the independent variable—is a decision that the researcher makes i n Hght of his or her general knowledge of the subject matter. A given set of variables and data may sometimes be v i e w e d one way, sometimes the other, and sometimes both ways together. For example, i n some cases, the price of scallions is seen as the dependent variable, i n some cases q u a n t i t y is seen as the dependent variable, and i n some cases—where there is thought to be m u t u a l causation—both are dependent and independent variables. M o r e about this later. The concept and notation of a function are of enormous value i n clarify­ i n g ideas. The concept is common i n all sciences that have reached the stage of investigating relationships, especially cause-and-effect relationships. ( B u t i n types of research that really have no independent variables-especially descriptive research of the census type—the functional form is not appro­ priate or useful.) Algebraic notation is not necessary to express the func­ tional notion that something depends on something else. As we have seen, this idea can be expressed i n words alone b y simply i n d i c a t i n g w h i c h are the 4. R. Merton clearly distinguishes these and other senses of functional relationship (Chap. 1).

34

The Process of Social-Science

Research

dependent and independent variables, and many teachers argue that algebraic notation only confuses the student. I am convinced, however, that the exercise of expressing variables i n algebraic functional notation is very useful, even i f one goes no further w i t h mathematical analysis. 6. Assumption, T h e o r y , D e d u c t i o n , Hypothesis, F a c t , and L a w Assumption, theory, deduction, hypothesis, and fact are commonly used concepts i n science, though they are often confused w i t h one another. T h e i r definitions and the relationships among them can best be brought out i n the context of a concrete case i n economics, the most effective user among the social sciences of a deductive theoretical apparatus.

a. E X A M P L E

1:

I N T E R E S T R A T E S AND GEOGRAPHY

Problem and Fact. I t is an observed fact that savings-and-loan interest rates on deposits were higher on the West Coast than on the East Coast or i n the M i d w e s t i n 1977. T h e problem is to predict w h a t w o u l d happen to this difference i n the future. Assumptions. W e assume first that investors w i l l choose the highest avail­ able rate of r e t u r n among investments that have equal risks, that is, that investors are " r a t i o n a r i n this respect. ( N o t i c e that this assumption does not i m p l y that h u m a n beings are generally rational rather than irrational or that all investors w i l l act rationally. Every economist knows that humans are often irrational, just as many free-falling objects, such as leaves and balloons, do not obey the l a w of gravity. Rather, this assumption is an "abstraction" or an "ideal type," as M . Weber called i t . ) Second, we assume that there are information flows i n the real estate market. ( I n most problems, the economist goes further and assumes perfect information.) T h i r d , we assume that there are no barriers to m o v i n g money from one part of the U n i t e d States to another. ( T h i s assumption w o u l d not be made for move­ ment among countries i n an international trade p r o b l e m . ) A n assumption is any statement i n this f o r m : " I f w holds true—and I assume that i t does—then z w i l l happen." There are many types of assump­ tions. One i m p o r t a n t distinction is between abstract assumptions that are part of the entire theoretical apparatus, of w h i c h w e have given three ex­ amples, and assumptions that are specific to the case under discussion, such as the ceteris paribus clause—which means, i n this case, that no unusual force is operating to confuse the situation.^ Deduction. W e then deduce that interest rates—savings-and-loan interest rates i n this case—will be the same i n all parts of the U n i t e d States. Given the assumptions, this deduction can be made as formally as any deduction 5. For discussion of this distinction, see F. Machlup (1955) and J. Melitz.

Basic Concepts of Research

35

in geometry. But of course the deduction holds true only if the assumptions are w e l l chosen—just as a calculation that uses the formula for a square to determine the area of a plot of land w i l l be correct only i f the plot of land has the shape of a square. The test of a deduction—and of the theory w i t h i n w h i c h i t takes place—is that all scientists i n a given field must be able to agree that the deduction is v a l i d if the assumptions are w e l l chosen. A deduction is only an exercise i n logic, just as is the arithmetic statement that 3 x 15 = 45. Theory. I f there are well-established assumptions i n a field, and i f there is an apparatus that permits such a deduction as we have made, then one may talk about a body of theory. I f so, a speculative statement ( conjecture ) must be related to this whole body of theory i f i t is to be called "theoretical.*' I n other fields, any conjecture or deduction from general experience is called theory. M o r e about this i n Chapter 5. Hypothesis. W e then translate the deduction into a hypothesis or conjec­ ture that can be tested empirically. I n this case, the hypothesis is that interest rates for the same type of deposit w i l l become more nearly equal i n various parts of the country as time passes. The reasoning is that there must have been some unusual and sudden conditions that created a temporary imbalance among parts of the country. B y deduction, the rates w i l l be equal i f there is not some continuing reason w h y they should be unequal, and w e assume that there is no such c o n t i n u i n g force. Therefore, the interest rates can be expected to become more equal. This is the hypothesis to be tested empirically. Not a l l hypotheses are deduced from theories, though most of t h e m rest on facts and assumptions. One m i g h t look around and hypothesize that tall girls get m a r r i e d earlier than short girls. This is indeed a scientific hypothe­ sis, b u t i t comes directly from observation and unformalized i n t u i t i o n rather than as a deduction from a body of theory. A n d a hypothesis is not the same as a theory, though many writers use the t w o terms almost interchangeably (for example, M . Friedman, pp. 3 - 4 6 ) . A hypothesis (conjecture) is a single statement that attempts to explain or to predict a single phenomenon, whereas a theory is an entire system of thought that refers to many phenomena and whose parts can be related to one another i n deductive logical form. A n unfortunate confusion i n usage pervades this book and others, how­ ever. W e often talk about "theoretical concepts" w h e n we really mean "hy­ pothetical concepts." For example, w h e n discussing an investigation into people's happiness, we called the vague and undefined concept happiness a theoretical concept—even t h o u g h i t is not part of any theory—to distinguish i t from the empirical concept ( i n this case, people's answers to a question­ naire) that w i l l stand for the hypothetical concept. But i n view of existing practice, to stick to the better usage w o u l d only be confusing here. Further-

36

The Process of Social-Science

Research

more, the terms ' e m p i r i c a l concept" and "empirical variable" are b o t h used, t h o u g h they are synonymous, w h i c h adds further confusion.^ Law. I f an empirical test of the hypothesis confirms the hypothesis, the generalization m i g h t be called a laio, provided that the finding is sufficiently important. E v e n i f our hypothesis were to be confirmed empirically, i t is insuflficiently i m p o r t a n t to be called a "law." Another example of the related concepts of fact, theory, hypothesis, and empirical test may be found on page 417.

7. U n i v e r s e a n d S a m p l e A universe ( o r p o p u l a t i o n ) is some group of people or objects i n w h i c h you are interested. The group may exist or not exist, and i t may be finite or infinite; for example, the universe of students who w i l l be graduated from college i n 2010 and thereafter has not yet been b o r n and is infinite. A sample is some subgroup of the universe. The purpose of studying a sample is to make some generalization about the universe. The major reason for t a k i n g only a sample, rather than studying the entire universe, is cost, w h i c h is discussed i n more detail i n Chapter 24; studying the entire u n i ­ verse is almost always prohibitively expensive. There are many kinds of samples, of w h i c h the random sample is a very i m p o r t a n t type. A sample is random i f every member of the universe stands the same chance"^ of being i n c l u d e d i n i t . (Read this definition again, and make sure y o u understand i t . ) Groups chosen on the basis of the research­ er s judgment about their similarity or typicality are another type of sample. Sampling v a l i d i t y and efficiency w i l l be discussed at length i n Chapter 9.

8. T h e I d e a l C a u s a l - S t u d y D e s i g n Much—but not all—of social science is the investigation of causal relation­ ships. Therefore, let us consider w h a t is involved i n establishing a causal relationship between some dependent variable y and a particular indepen­ dent variable x. This section is a preview of m u c h that follows about re­ search design. D e t e r m i n i n g a causal relationship w o u l d ideally be done as follows: The value of the dependent variable y that is observed for any given subject i n the study depends on three elements: first, w h a t the subject was like before i t was acted upon b y the independent variable; second, the particular level (value or strength or v a r i e t y ) of the independent variable's action upon the subject d u r i n g the study period; and, t h i r d , the particular values of all other 6. Sociologists commonly use a slightly different lingo. This book's "empirical variable" is their "indicator," and this book's "hypothetical concept" is just plain "concept" to them. 7. More precisely, the sampling process is random if every member of the universe has a known chance of being included in the sample.

Basic Concepts of Research

37

variables that act u p o n the subject d u r i n g the study period, b o t h those i n w h i c h y o u may be interested and those that are just interferences. T o p u t i t i n algebraic notation ( w h i c h y o u may skip i f y o u like b u t w h i c h I believe w i l l help y o u and should not scare y o u at a l l ) : yi = f i^i, 1, X i , 2 . . . OCi^ n, t)i, 1, l^t, 2 . . . nl Zi^ i , Zi, 2, • • • ^ i , n ) • I n this equation t / i is the observed value of the dependent variable for subject i; Xi, i is the value of the independent variable Xi that subject i is exposed to d u r i n g the study; Xi, 2 • • • ^ i , n are other variables i n w h i c h y o u may be interested that act u p o n the subject i d u r i n g the study period b u t that we shall ignore now; Vi, i, Vi, 2 • - n are the influences that acted u p o n the particular subject i and formed h i m or her before the study began; and Zi, I , Zi, 2 • • • ^ i , n are other influences that may affect the outcome of the research ( t h a t is, may affect the observed y for a given subject) b u t w h i c h y o u are not interested i n . The zs w i l l be called "interfering" or "con­ f o u n d i n g " variables or, i n some contexts, "parameters." ( T h e definitions of all these terms vary somewhat among the people w h o use t h e m and the situations i n w h i c h the terms are used.) For example, consider a study of the effect of the amount of p r o t e i n i n children's diets upon their ability to learn, as indicated b y their ability to memorize nonsense syllables. The symbol yt represents c h i l d is score on a memory test. The symbol Vi, 1 represents the intelligence of her parents, Vi, 2 the quality of her schooling, and so on for other vs. The symbol Xt, 1 is the independent variable, the amount of protein i n c h i l d i's diet. The symbol Zi, 1 m i g h t represent the amount of exercise the c h i l d gets d u r i n g the study period; Zi, 2 m i g h t represent the time of the day the memory test is given; Zi, 3 m i g h t represent w h i c h tester examines and grades the child; and so on for the many other possible zs.

The ideal study is one i n w h i c h t w o ( o r m o r e ) groups of subjects, each made u p of subjects w i t h exactly the same v qualities, are subjected to t w o (or m o r e ) different levels of independent variable Xi, ceteris paribus, that is, w h i l e a l l the other z influences are kept exactly the same for a l l the subjects i n the study. I n the example above, i t could mean that, say, t w o equal groups of c h i l d r e n ( t h a t is, each group containing children w i t h exactly the same personal-background qualities [v] as those i n the other g r o u p ) w o u l d have t w o different amounts of protein i n their diets. A l l w o u l d be tested for memory at the same t i m e of day, all w o u l d get the same amount of exercise, and so on. N o w we must ask h o w the researcher can meet these requirements for the ideal study of causal relationship. First, h o w can the personal-background variables be made equal i n the t w o groups? I f y o u can conduct a controlled experiment, this p r o b l e m i n achieving ceteris paribus may be solved per­ fectly, at least i n principle, b y calling u p o n the device of randomization of the assignment of subjects to treatments. I f y o u were w o r k i n g w i t h rats instead of children ( a n d testing their memories w i t h a device more appro­ priate to rats than nonsense-syllable l e a r n i n g ) , y o u could start w i t h a large

38

The Process of Social-Science

Research

collection of rats and could randomly choose w h i c h rats w o u l d get the highprotein diet and w h i c h the low-protein diet. This could be done b y g i v i n g each rat a number and then assigning the high-protein diet to the first half of the numbers p u l l e d from a hat. I f the sample groups were large enough, all the relevant background characteristics (vs) w o u l d then be d i s t r i b u t e d fairly evenly between the high-protein and low-protein groups ( t h e proof of this is the statistical principle k n o w n as the " L a w of Large N u m b e r s " ) . This is w h y the randomization of subjects is a p o w e r f u l weapon and a crucial principle i n carrying out v a l i d studies of causal relationships. For several reasons a researcher may not be able to achieve this ideal i n equating personal-background characteristics. First, the groups of subjects may be too small i n number for the L a w of Large Numbers to guarantee a decent split of the various characteristics among the t w o groups. I n that case, y o u may t r y to match the groups by assigning to the t w o groups subjects that exhibit the characteristics you deem relevant to as nearly equal a degree as possible. Effective matching of experimental groups is not easy to achieve, as we shall see later. A second force that may prevent random assignment of subjects to groups is that the researcher may not have the power to arrange a controlled exper­ iment; that is, i t may just be impossible for y o u to assign people to groups and then subject the groups to experimental treatment—as may w e l l be the case w i t h p r o t e i n diets and children. One possible ( b u t treacherous) device is to match subjects after the fact, that is, to find apparently similar people w h o already have been subjected to one or another of the treatments and then compare them on the y variable. This device w i l l be discussed at length later. N o w we proceed to the part of the causal study that is usually hardest to do w e l l , the h a n d l i n g of the interfering variables (z's) that influence the subjects starting at the time w h e n x,, the independent variable of interest, begins to act, that is, after the study p e r i o d begins. The first point to re­ member is that each z variable must be taken care of separately and ex­ plicitly by the researcher. This situation is unlike the one w i t h the personalbackground variables, i n which—if experimentation and random assignment of subjects are possible—you can take care of all of t h e m at one fell swoop b y assigning subjects randomly to the experimental groups. I n the present i n ­ stance, i f y o u do not properly take care of some i m p o r t a n t z your research may be r u i n e d . One w a y to take care of any z variable is to make i t a parameter, that is, to "hold i t constant" and make i t the same for all subjects i n all the experi­ mental groups. I n the protein-memory example, amount of sleep m i g h t affect the subject's performances. This factor can be h e l d constant b y p u t t i n g all the subjects i n b o t h experimental groups to sleep at the same time and w a k i n g t h e m all at the same time. B u t some variables cannot be h e l d constant i n this fashion; i n every piece of research there are several such variables. For example, one m i g h t w a n t all the children to eat their meals at the same time of day, to have their memories tested at the same

Basic Concepts of Research

39

time of day, and to have their memories tested b y the same examiner. But suppose that i t is physically impossible for each c h i l d to begin eating at the same moment because of l i m i t e d feeding facilities, and some children must w a i t for others to be tested before i t is their turn. A n d unless there is only one examiner, the children must somehow be d i v i d e d among different ex­ aminers. For any one of these variables that cannot be h e l d constant, an effective alternative is to arrange i t so that the members of each experi­ mental group are affected randomly b y the variable. For example, i t should be a matter of a coin toss or dice t h r o w whether each c h i l d i n each group eats on the early or the late shift. Also the order i n w h i c h the c h i l d r e n are tested, and by w h i c h examiner, should be arranged w i t h random drawings. These r a n d o m arrangements w i l l — i f the groups are large enough—accom­ plish m u c h the same result as h o l d i n g the variables constant, b y ensuring that on the average the t w o groups w i l l be affected reasonably equally b y each of the z variables. This device may be termed "randomization of inter­ fering variables to avoid confounding" or just "randomization." W h e n the researcher cannot control these variables experimentally, however, difficul­ ties arise. Earlier I said that there is no automatic way to achieve ceteris paribus and to handle all the nonrelevant variables, even i f one has complete exper­ imental control. The reason should now be clear. The experimenter must be able at least to identify each nonrelevant variable that m i g h t influence the results so that she can t r y to h o l d i t constant or randomize its effect; no blanket mechanism handles all these nonrelevant variables. A n d no one's imagination is active enough to t h i n k of all the possible interfering (con­ founding ) variables. Furthermore, no one's resources are great enough to be able to deal w i t h every possible interfering variable that has even a slight likelihood of being i m p o r t a n t . 9. C e t e r i s Paribus^ Theoreticians can ignore the ceteris paribus problem, or simply salute it. But to arrange affairs so that other things are reasonably equal is perhaps the hardest and most i m p o r t a n t struggle that the empirical researcher must face. One of the outstanding characteristics of the social sciences is that the subject matter is not static, not fixed, not immutable. I n classical physics or chemistry y o u can usually be confident that w h a t happens today w i l l also happen tomorrow, that w h a t happens i n the East w i l l happen i n the West, and that w h a t happens to the contents of one test tube w i l l happen again to those of the next test tube. The chemist or physicist seldom needs to w o r r y 8. Ceteris paribus is Latin for "other things being equal." We reduce to inaction all other forces by the phrase "other things being equal": we do not suppose that they are inert, but for the time we ignore their activity. This scientific device is a great deal older than science: it is the method by which, consciously or un­ consciously, sensible men have dealt from time immemorial with every difficult problem of ordinary life. (A. Marshall, p. xiv)

40

The Process of Social-Science

Research

that a sample of matter perversely j u m p e d into the hand to be studied just because i t is different from other samples of matter. B u t exactly these things occur i n social science studies a l l the time. A l l such occurrences are "depar­ tures f r o m " or "breaches of" ceteris paribus. A l l of them are conditions that the social scientist must overcome i n order to "hold everything else con­ stant." Here are t w o glaring examples of breaches of ceteris paribus that vitiate comparisons: According to the census of January 1, 1910, Bulgaria had a total of 527,311 pigs; 10 years later, according to the census of January 1, 1920, their number was al­ ready 1,089,699, more than double. But, he who would conclude that there had been a rapid development in the raising of pigs in Bulgaria (a conclusion that has indeed been drawn) would be greatly mistaken. The explanation is quite simply that in Bulgaria, almost half the number of pigs is slaughtered before Christmas. But after the war, the country adopted the "new" Gregorian calendar, abandoning the "old" Julian calendar, but it celebrates the religious holidays still according to the "old" manner, i.e., with a delay of 13 days. Hence January 1, 1910 fell after Christmas when the pigs were already slaughtered, and January 1, 1920, before Christmas when the animals, already condemned to death, were still alive and therefore counted. A difference of 13 days was enough to invahdate completely the exhaustive figures. (O. Anderson in Morgenstern, pp. 46-47) D o u b l i n g of a p i g population over ten years is not biologically remark­ able, and the Bulgarian data m i g h t w e l l be accepted as fact b y a person w h o is unfamiliar w i t h Bulgarian history. This reminds us that i n doing sound research there is no substitute for thorough knowledge of one's subject matter plus w i d e experience of the w o r l d . We hear of a museum in a certain Eastern city that was proud of its amazing attendance record. Recently a little stone building was erected nearby. Next year attendance at the museum mysteriously fell off by 100,000. What was the little stone building? A comfort station. (WaDis & Roberts, p. 160) T h e enormity of these ceteris paribus breaches makes i t likely that the researcher w i l l find them out and remedy them. M o r e dangerous are the subtle influences that can r u i n the research, like those I . Pavlov described: The environment of the animal, even when shut up by itself in a room, is perpetually changing. Footfalls of a passer-by, chance conversations in neighbor­ ing rooms, slamming of a door or vibration from a passing van, street-cries, even shadows cast through the windows into the room, any of these casual uncontroHed stimuh falling upon the receptors of the dog set up a disturbance in the cerebral hemispheres and vitiate the experiments. To get over all these disturb­ ing factors a special laboratory was built at the Institute of Experimental Medicine in Petrograd, the funds being provided by a keen and public-spirited Moscow businessman. The primary task was the protection of the dogs from uncontrolled extraneous stimuli, and this was effected by surrounding the build­ ing w i t h an isolating trench and employing other special structural devices.

Basic Concepts of Research

41

Inside the building all the research rooms (four to each floor) were isolated from one another by a cross-shaped corridor; the top and ground floors, where these rooms were situated, were separated by an intermediate floor. Each research room was carefully partitioned by the use of soundproof materials into two compartments—one for the animal, the other for the experimenter. For stimulat­ ing the animal, and for registering the corresponding reflex response, electrical methods or pneumatic transmission were used. By means of these arrangements it was possible to get something of that stability of environmental conditions so essential to the carrying out of a successful experiment, (p. 109) A n d H . Ebbinghaus summarized the p r o b l e m as i t faced h i m i n his pioneering study of rates of learning and forgetting: He who considers the complicated processes of the higher mental life or who is occupied with the still more complicated phenomena of the state and of society will i n general be inclined to deny the possibility of keeping constant the condi­ tions for psychological experimentation. Nothing is more familiar to us than the capriciousness of mental life which brings to nought all foresight and calculation. . . . We must try in experimental fashion to keep as constant as possible those circumstances whose influence on retention and reproduction is known or sus­ pected and then ascertain whether that is sufficient. The material [to be learned] must be so chosen that decided diflerences of interest are, at least to all appear­ ances, excluded; equality of attention may be promoted by preventing external disturbances; sudden fancies are not subject to control, but, on the whole, their disturbing effect is limited to the moment, and will be of comparatively little account i f the time of the experiment is extended, etc. (pp. 11-12) I t was for these reasons that Ebbinghaus used nonsense syllables rather than meaningful words or sentences as the stimuli i n his experiments. The p o i n t of the ceteris paribus idea is that i t is not sensible to compare a sample of apples to a sample of oranges i f you are t r y i n g to find out the effect of t w o kinds of fertilizer. H o w w o u l d y o u ever k n o w whether the apples had an especiafly good season because of fertifizer A , or because i t was a good season for apples and not for oranges? Analogously, people i n Rochester may differ from people i n Syracuse, for many reasons. T o com­ pare their reactions to different advertisements is to compare the effects on apples and oranges. A l l other things i n Syracuse are not equal to a l l other things i n Rochester, and therefore the simple comparison is flawed. Of course, i t is true that we can never get all the other things equal. Even the people within Rochester are not exactly like one another, and no t w o apples i n a basket are perfectly alike. The only w a y that we could get everything perfectly equal w o u l d be to t r y the different fertilizers on the same tree or to t r y the different advertisements on the same person. E v e n then all else w o u l d not be equal because a person is not the same person after b e i n g exposed to the first advertisement, and the tree is not the same i n two successive g r o w i n g seasons. W e must resign ourselves to the fact that w e shall never get a l l the other things exactly equal. Instead, our job is to get other things as nearly equal as possible or at least equal enough so that we can proceed w i t h the research

42

The Process of Social-Science

Research

w i t h o u t hindrance b y unexpected and u n k n o w n inequaHties i n the condi­ tions surrounding the research. 10. S u m m a r y This chapter introduces basic concepts i n research. A variable is a factor i n w h i c h you are interested that varies i n the course of the research. The dependent variable is the quantity whose variation y o u wish to explain or predict or understand. Independent variables are forces whose effect upon the dependent variables y o u wish to evaluate. The t e r m parameter has t w o q u i t e different meanings. I n one meaning i t is a q u a n t i t y that may affect the research b u t that y o u wish to keep immo­ bilized and unchanged throughout the study. I n another meaning a param­ eter is a property of a universe, i n contrast to a statistic, w h i c h is an estimate of the parameter obtained from a sample. The functional form y = f ( x ) is the basic logical structure of all causeand-effect research. Assumption, theory, deduction, hypothesis, fact, and laiv are often-con­ fusing elements of the m e l d of theoretical and empirical research. Their meaning and use differs from one social science to another. A sample is a group selected from the universe {population) for the purpose of describing the population. Randomly-drawn samples have great advantages b u t sometimes they are not practical; sometimes matched or judgmental samples are more appropriate. The ideal design for the study of cause and effect compares the effect of different independent variables upon the dependent variable i n randomly chosen sample groups. Irrelevant factors must be controlled to prevent them from r u i n i n g the study; that is, there must be no i m p o r t a n t breaches of ceteris paribus.

EXERCISES 1. G i v e a n e x a m p l e of a q u a n t i t y t h a t is a v a r i a b l e t o o n e r e s e a r c h e r b u t n o t a v a r i a b l e t o a n o t h e r r e s e a r c h e r , t h o u g h it is p a r t of t h e " e n v i r o n m e n t " of the second researcher's study. 2. I l l u s t r a t e t h e c o n c e p t of p a r a m e t e r in a c o n t e x t o t h e r t h a n e c o n o m i c d e ­ mand analysis. 3. I l l u s t r a t e t h e r e l a t i o n s h i p of t h e c o n c e p t s of s t a t i s t i c a n d p a r a m e t e r f o r a given universe. 4. E x p r e s s t h e e s s e n c e of a n y t h r e e s c i e n t i f i c s t u d i e s in t h e f u n c t i o n a l f o r m y ~ f{x^, X2 . . .). O r , if y o u h a v e d u g in y o u r h e e l s a n d resist a c c e p t i n g the algebraic functional form, write down the dependent and independent variables for the three studies.

4 types of empirical r e s e a r c l i 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Introduction Case-Study Descriptive Research Classification Research Measurement and Estimation Comparison Problems Research That Tries to Find Relationships Finding Causes and Effects Mapping Structures Evaluation Research Summary

1. I n t r o d u c t i o n Students of psychology often t h m k that experimental investigation of causeand-efFect relationships i n h u m a n or animal behavior constitutes the whole of social-science research. M a n y students of economics think that the statis­ tical investigation of the past relationship between price and the amount of commodities sold is the only w a y to do empirical investigation i n economics. Some students of anthropology believe that only participant-observers learn a n y t h i n g w o r t h w h i l e about the social w o r l d we live i n . Psychoanalysts some­ times act as i f no one can claim to understand anything about h u m a n behavior w i t h o u t subjecting i t to clinical analysis i n depth. Students of market research sometimes t h i n k that all research involves finding the rela­ tionship between particular personal and social characteristics and pur­ chasing behavior. A n d the other discipHnes have pet methods too. Each of these beliefs has some basis i n fact.^ But one of the main themes of this book is that there are many types of empirical research and that each may be proper for a particular scientific researcher tackling a particular question. "Mere" description b y an anthropologist may seem terribly p r i m i ­ tive to a psychophysicist, b u t a wise psychophysicist may sometimes use a simple descriptive technique to good advantage. Conversely, an anthropolo1. The best way to define the various social-scientific disciplines may well be in terms of their characteristic methods, whereas it seems to me that the natural sciences tend to cluster around substantive problems.

44

The Process of Social-Science

Research

gist w h o has an imaginative approach to his subject may one day engage i n a simple laboratory experiment to test a theory. One should not let one's discipline determine the choice of method; rather, one should fit the method to the problem. The task of this chapter is to distinguish among and describe the various types of research problems, to help you understand the research possibilities of the research question y o u select. I n order to choose the appropriate research methods, one must understand the nature of the research question (see Chapter 18) and also the obstacles to getting knowledge to answer the question. This chapter may seem unsatisfying to y o u because i t introduces a great many topics w i t h o u t discussing them thoroughly. More detailed discussion comes later. The purpose of this chapter is p r i m a r i l y orientation, to provide a w i d e overview of the types of w o r k done i n social science.

2. Case-Study D e s c r i p t i v e R e s e a r c h I n the beginning, there is description. W h e n one does not k n o w anything at all about a problem, one must understand i t i n a general way before begin­ n i n g to make specific inquiries about specific aspects of the subject. For example, the early explorer i n a new l a n d writes a general description of the appearance of the country, its geography, climate, people, flora, fauna, and m u c h else. Sea captains and missionaries wrote such descriptions of many exotic lands, though too often their reports were anecdotal and shallow. The early explorer chooses to describe w h a t he thinks to be i m p o r t a n t and i n ­ teresting, w i t h o u t any r i g i d rules of scientific evidence. This first description is i m p o r t a n t because i t serves to focus subsequent studies. Geologists later come to study the peculiar stone formations mentioned i n the explorer's report. A n d anthropologists rush to study w i t h great objectivity the ex­ traordinary patterns of m a t i n g w i t h foreigners only h i n t e d at therein. Descriptive research i n the form of case studies- is usually the j u m p i n g off p o i n t for the studv of new areas i n the social sciences.'^ S. Freud's case history "Observation 1—Miss A n n a O." and similar histories of other pa­ tients l a i d the foundation for modern cfinical and personality psychologies. Since Freud's original descriptive explorations, there have appeared many other types of studies of the original theories, i n c l u d i n g observational and questionnaire surveys and experiments. As F r e u d p u t i t , "the true beginning of scientific activity consists . . . i n describing phenomena and [ o n l y ] then in proceeding to group, classify and correlate them . . ." ( K a p l a n , p. 7 8 ) . M u c h anthropological research is descriptive, deliberately setting out to create a rounded picture of an entire culture or some b r o a d aspect of i t . I n 2. Chapter 14 gives some hints on how to go about doing a case study and discusses fur­ ther the nature of the case study. 3. Census-type studies may also be considered descriptive research, but we shall discuss them under the heading "Measurement and Estimation."

Types of Empirical Research

45

economics the industry case study continues to be done l o n g after eco­ nomics has left its infancy, though i n contemporary industry studies the economist uses sophisticated theories and statistical techniques of descrip­ t i o n that encompass the other types of research we shall discuss. ( T h i s fact shows the difficult) of classifying types of research. W e shall also find descriptive research done w i t h i n frameworks of classification, cause-andeffect, and other methods.) A business consultant generally begins w o r k w i t h a general description of the situation. B u t the "operations research" person often skips this stage, immediately narrows d o w n the problem, and tackles i t as a more "ad­ vanced" type of research. Sometimes this early n a r r o w i n g - d o w n is success­ ful, b u t sometimes i t causes the operations researcher to miss the essence of the p r o b l e m . Some scientists regard descriptive research as only an early stage of re­ search. There is something to this p o i n t of view. Descriptive research does not create laws and conclusions that apply beyond the subject matter de­ scribed. Rather, i t provides clues for subsequent research to p i n d o w n and generalize. Nevertheless, I t h i n k i t is unsound to see descriptive research as only a stage. First, a piece of descriptive research can be of important scientific value for itself, even though i t cannot be generalized; a study of the a l u m i n u m industry, for example, can provide information valuable for such purposes as antitrust evaluations, even though the findings do not apply to all other industries. A n d , second, the stage view implies that we k n o w w h a t the later stages of research are. I do not think there is solid evidence for such an evolutionary v i e w of science. T h e importance of deciding upon and defining the variables i n a research study was emphasized i n the previous chapter. B u t a descriptive study does not have a set of clearly delineated dependent and independent variables. T h e absence of a l i m i t e d number of well-defined variables distinguishes case-study descriptive research from other types of research. Students should not automatically shy away from descriptive research. Professors often t o u t students off descriptive projects, however, because they are harder to do w e l l and easier to do atrociously than are other types of research. Descriptive research does not reveal sloppy and brainless w o r k as glaringly as do more "rigorous" types of research. For this reason, other types of research usually make better t r a i n i n g exercises than does descrip­ tive research. Path-breaking descriptive research, such as that of Freud, is especially difficult because one starts w i t h empty hands—no guideposts, no standards, no yardsticks, no intellectual framework, no categories w i t h i n w h i c h to classify w h a t one sees. The researcher's sole resources are whatever concepts he can b o r r o w from other fields and the ordinary words of the common language. ( E v e r y w o r d is indeed a concept b u t not necessarily a concept especially fitted to the phenomena that the researcher w i l l w o r k w i t h . ) H e

46

The Process of Social-Science

Research

must create his o w n classification and his o w n guideposts. He must decide w h a t to look at and w h a t to ignore, w h a t to record and w h a t not to record, w h i c h clues to follow up and w h i c h to drop, w h a t is i m p o r t a n t and w h a t is valueless. The early descriptive researcher has great freedom, b u t such great freedom can be terrifying. Once a t r a d i t i o n of descriptive research is estab­ lished i n a field, however, as is n o w the case i n anthropology, there are standards and concepts that the researcher can use. Chapter 14 gives y o u some step-by-step advice to help you do case-study descriptive research.

3. Classification R e s e a r c h Classification is the process of sorting out a collection of people or objects and of developing a set of categories among w h i c h you divide the collection. No sooner does the scientist see several different examples of a given phenomenon than she begins to say, "This one is like that one and b o t h are different from that other bunch." Then she coins common names for those examples that are like one another. The sorting out may come first and the construction of categories (called "taxonomy") afterward, or the order can be reversed (a priori classification). Classification as an end i n itself is the subject of this section. More frequently classification is a step i n some other type of research. Bacon was an early "imperialist" i n the history of science. He v i e w e d all science as w i t h i n the domain of his favorite scientific method—classification. Nowadays many scientists are not even w i l l i n g to dignify classification studies w i t h the name "scientific." B u t i n m y opinion classification research is still i m p o r t a n t and always w i l l be. Again, there is room i n the house of science for all kinds of problems and methods. The w o r k of Linnaeus, a scheme for classifying the entire plant w o r l d so as to reveal the family relationships of the various species, is an eighteenthcentury classic. A n d for a long time medical research meant little b u t classi­ fication of diseases b y their symptoms. Even now classification is important i n the advancement of medicine. PHOBIAS IN BRITONS F A L L

I N T O 130

TYPES

A half million Britons are afraid of things ranging from blood to barbershops, with some even afraid of being afraid. . . . Britain's phobia victims suffer from at least 130 different types of irrational fear. . . . The . . . British Medical Association and the National Association for Men­ tal Health, said that one of the largest groups was the agoraphobics—people afraid of open spaces. They number 100,000, many of them women fearful of leaving their own homes. . . . Fear of spiders, matches, green leaves, pictures of ships in distress, birds and feathers, cats, dogs, mice, frogs, toads, wasps, snakes, blood and thunder were among disabihties haunting people. {New York Times, Oct. 14, 1969, p. 13)

Types of Empirical Research Another medical-psychological " F a m i l i a l obesity" is one category:

example

classifies

types

of

47

obesity.

Familial obesity—Snacks at any hour on a social basis. Often is a good cook and enjoys own cooking. Other than housekeeping activities, usually leads sedentary existence. Motivation to reduce is poor. Rapport with doctor is good. Tension in­ frequent. Caloric pattern often follows pattern of entire family. Food is center of family social life. Prognosis good. (Chicago Daily News, January 27, 1964, p. 3) I m p o r t a n t classification research is found i n all the social sciences. F r e u d classified the various psychological defense mechanisms. Sociologists classify various kinds of crowds and riots. Political scientists since Aristotle have classified various forms of government. Economists classify markets, p r i c i n g schemes, and devices used as barriers to entering markets. Some classifications are mere catalogues of more-or-less m u t u a l l y exclu­ sive categories. Other classifications have more "rational" bases. A n example of the latter is Ranganathan's "colon classification" scheme for classifying library books according to five master attributes: personality, matter, energy, space, and time. The t w o basic tasks i n classification work are: ( 1 ) constructing the categories and ( 2 ) assigning each observation to the appropriate category (or several categories i f the classification is m u l t i d i m e n s i o n a l ) . "Numerical taxonomy" (see Sneath and Sokal) is the name of a recent statistical ap­ proach to classification that can help svstematize the process of m a k i n g classifications. W h a t is classification research good for? Here are five uses of a classifica­ tion scheme: A classification enables one to deal routinely with individual cases. After a doctor has decided that a patient has smallpox, the treatment is almost automatic. A n d the authors of the obesity classification report that i t has aided them i n diagnosing patients for treatment. W i t h o u t a classification scheme a doctor w o u l d have to do an impossible amount of study on each patient before selecting a treatment; the classification scheme enables her to take advantage of the accumulated store of medical knowledge about w h i c h treatments aid w h i c h diseases. A classification aids summarization. U n t i l a political scientist decides to divide countries into "one-party," "two-party," and " m u l t i p a r t y " systems, he cannot count up h o w many countries there are of each type. After psychia­ trists classify patients as "manic-depressive" or "schizophrenic," a researcher can summarize the number of each type observed i n various countries. Summaries of this sort provide knowledge of the group as a whole. A classification makes other scientists aware of differences among the cate­ gories. Whether the categories be species of plants, types of headaches, or varieties of monopolies, classification often leads the other scientists to

48

The Process of Social-Science

Research

understand and explain the differences. For example, after F r e u d classified the various defense mechanisms, i t was natural to i n q u i r e into w h y some people repress, others rationalize, still others project, and so on. The classification may contain within itself the explanation of phenomena. I f the category description says that a person suffering from familial obesity "usually leads sedentary existence," i t suggests a reason w h y the person is fat. The explanation may have been unintentional on the part of the classi­ fier, but such explanations are a frequent valuable by-product of classifica­ tion. A classification clarifies ones understanding. Remember h o w many times you have gotten into an argument that seemed futile and then y o u ( or even the other fellow!) said: "Let's make a distinction between the zilches and the squilches. W h a t you say may be true of the zilches, b u t i t certainly isn't true of the squilches." You may find that the argument has suddenly evaporated and that the t w o of you agree. ( T h i s fifth p o i n t is really a summary of the previous four points. ) But the process of p u t t i n g people or things into categories also has a drawback. ( O n e inevitably loses some i n f o r m a t i o n . ) For example, assume you have collected "open-ended" free interviews about racial integration. I n order to handle the data quantitativel)% you classify ( "code" ) the interviews into those for or against integration. You thereby lose all the shadings of opinions voiced b y the interviewees and the richness and variety of their comments. But, unless y o u classify i n this manner, y o u cannot handle the people i n groups. ( Later, however, w^e shall see h o w some of the other infor­ mation can be saved and used simultaneously b y cross-classification—classification on several dimensions at the same time. ) The loss of i n d i v i d u a l i t y i n a classification scheme is the basis of a persis­ tent attack on social science. The critic says, " H o w can y o u talk as i f any t w o different people i n your survey were exactly the same?" or " H o w can y o u l u m p together wholesalers i n Vermont and wholesalers i n Louisiana when they serve very different markets?" The real question is whether the items are similar enough for your purposes; i f so, the classification is fruitful. M i n e r a l o i l and coffee is the appropriate antidote whether a c h i l d w h o swallows furniture polish lives i n V e r m o n t or i n Louisiana. I n that instance the difference i n geography does not matter at all. ( B u t for other purposes the difference i n geography is indeed crucial. ) People w h o refuse to ignore the differences among individuals may avoid erroneous generalizations. But they may also avoid any generalizations at all, w h i c h makes science impossible. ( Some people criticize classification from a sincere desire for deeper t r u t h ; some, to t h w a r t the efforts of others, use anecdotal evidence and exceptional occurrences to t h r o w sand i n the gears of scientific conceptualization. )

Types of Empirical Research

49

Classification research is different from other types of research i n that one does not usually go out and collect new data for a classification study. Rather one is likely to w o r k w i t h existing data, sorting i t into a classification that makes sense of i t . Therefore, classification research tends to follow after descriptive research i n the sequence of scientific stages. As for the concept of variable i n classification research, the classification scheme itself can be v i e w e d as one massive variable or as a set of variables, for i t is a set of distinctions among a set of related phenomena. B u t the variable or variables are not "dependent" or "independent," at least u n t i l employed i n other research. For example, i f one wanted to determine whether women are more prone to familial obesity than are men, the obesity classification w o u l d be a dependent variable; obesity type = / ( s e x ) . O r i f an economist were to investigate h o w the presence of monopoly affects eco­ nomic development, the market classification into monopoly or other forms w o u l d be an independent variable; rate of economic development = / ( t y p e of market f o r m ) . Every collection can be classified i n manv different ways. For example, I trust that you w i l l not take too seriously the scheme b y w h i c h research problems are classified i n this chapter; one could slice up the research salami i n many different ways, some of them more careful and systematic than the classification I use here. M v object is to illustrate for y o u the variety of problems w i t h i n scientific research, i n contrast to the v i e w that all science can be boiled d o w n to a single sort of problem, and the classificationscheme is intended to do this and only this; i f i t does, i t is a good one and, i f not, not.

4. M e a s u r e m e n t a n d E s t i m a t i o n Measurement research seeks to establish the size of a phenomenon on one or more of its dimensions: its weight, height, speed, intelligence, number of members, or w h a t have you. Economic data for firms and governments are t y p i c a l measurements. Measurement differs from case-study description i n these ways: Measure­ ment research focuses on one or a few dimensions, and measures them systematically and i n relatively great detail; case study gathers information on many dimensions of the phenomenon, w i t h or w i t h o u t numerical descrip­ tion, and i n a more ad hoc fashion. A m o n g the most frequent subjects of measurement i n social science are the following, all of w h i c h are described i n detail i n every elementary statis­ tics text: the total, the central value, the proportion, the d i s t r i b u t i o n b y various categories, and the amount of variability. D e c i d i n g what to measure and h o w to draw the definitional boundary lines around the quantity to be measured—translating the theoretical ( h y p o t h e t i c a l ) concept into empirical terms—is a crucial decision i n mea­ surement research. A n example: W h e n one wants to estimate the cost to

50

The Process of Social-Science

Research

General Motors of p r o d u c i n g another h u n d r e d thousand automobiles, should one include any of the salaries of the top management i n the mea­ surement? The cost accountant w i l l always say "yes," b u t for some decisions the better answer is "no," as, for example, w h e n General Motors must calculate its costs in connection w i t h a potential sale to a fleet owner of a batch of already-produced trucks.

5. C o m p a r i s o n Problems L e t us consider some examples first. W e compare the nation's preferences between t w o people b y means of a presidential election. M a r k e t researchers use television ratings to compare the number of listeners that t w o television shows attract. Psychologists compare the efficiency of teaching machines w i t h the efficiency of conventional classroom instruction. I n their interpreta­ tion of their data Kinsey, et ah, frequently compare the behavior of men w i t h that of women, that of young people w i t h that of o l d people, and so on. C u l t u r a l anthropologists sometimes compare t w o cultures, to discover differences and similarities. A n d social anthropologists sometimes compare the kinship systems, say, i n a great many different cultures. Most empirical research i n psychology, sociology, marketing research, education, anthropology, political science, and all other branches of social comparison is part of research intended to establish cause and effect. The comparison may be of something against nothing; for example, an hour's t u t o r i n g may be compared against no t u t o r i n g to see whether the t u t o r i n g raises grades at all. Comparison problems and measurement problems have m u c h i n common. I n fact, you can have an enjoyable time arguing that comparison problems are really a subtype of measurement problems or, conversely, that measure­ ment problems are really a subtype of comparison problems. The key differ­ ence between comparison and measurement is that measurement is against a known standard. T h e standard may be a man's foot or a carob bean or a p l a t i n u m - i n d i u m meterstick. But the standard is assumed to be commonly known, and its common acceptance gives the standard its value as a stan­ dard. Comparison problems, on the other hand, compare t w o or more entities w i t h one another. I n a comparison problem, we are interested i n the relative measurement of tivo or more phenomena, whereas i n measurement prob­ lems we are interested only i n one event relative to a standard quantity. W h e n the psychologist compares t w o methods of teaching children to read, she is interested i n finding the faster method; she is not interested i n h o w long either one takes i n absolute time. W h e n a network compares the ratings of t w o television shows, its usual purpose is to identify the more popular show. A n election seeks to establish w h i c h candidate is preferred to the other candidates. The difference between comparison and measurement is i l l u m i n a t e d by the different types of adjectives used, "fast" versus "faster," the absolute versus the comparative.

Types of Empirical Research

51

Comparison problems are often framed i n the logic of statistical h y p o t h ­ esis testing. A n d discussion of hypothesis testing has dominated the discus­ sion of research methods i n the social sciences, and is at the center of classical statistics. B u t i t is very i m p o r t a n t to recognize that hypothesis testing is only one of several types of research problems, even though i t is the dominant type of research i n some disciplines. A master survey re­ searcher even generalizes i n the other direction: At their present stage of development, however, the social sciences cannot in­ sist on this paradigm [hypothesis testing]. Our thinking is rarely far enough progressed to enable us to start out with a sharply formulated hypothesis; most studies are exploratory, directed toward the general examination of a field in order to develop theoretical formulations. (Kendall & Lazarsfeld, p. 133) Researchers whose experience has been mostly w i t h comparison problems sometimes t r y to treat all problems as problems i n hypothesis testing. Vio­ lence is done to thought and procedure i n the effort to jam all research into the hypothesis-testing m o l d . For example, i t requires intellectual contortions to cast the U.S. Census as a p r o b l e m i n hypothesis testing; the same is true of many other problems i n description or measurement. H . Roberts makes this p o i n t forcefully about research i n business and economics: I n fields with a highly developed theoretical structure—especially the natural sciences—it is reasonable to expect that most empirical studies will have at least some sharp hypotheses to be tested. This is not true for many areas of business interest, and attempts to force research into this mould are both deceitful and stultifying. "Hypotheses" are likely to be no more than hunches as to where to look for sharper hypotheses, in which case the study might be described as an intelligent fishing trip. (Roberts, p. 2) Comparisons are usually made on one dimension at a time. W e m i g h t say that one reading-instruction method teaches children faster b u t that the other method requires less teacher attention. These dimensions of speed and amount of teacher attention are chosen for comparison because the re­ searcher believes them to be interesting, or i m p o r t a n t for practical or theoretical reasons, or relevant i n some other way. Comparisons may be made on more than one dimension by combining the ratings on several dimensions into a single index. Sometimes this combining is part of the research job, as w h e n one uses several items on a single test. B u t sometimes i t goes beyond the research job. I f a school system gives a read­ ing-instruction m e t h o d t w o points for excellence i n speed of teaching and one p o i n t for amount of teacher attention, the over-all comparison must depend on the value judgments of the school board about the importance of teaching speed and teacher attention; these value judgments are the source of the p o i n t values. I n the natural sciences and increasingly i n the social sciences as they become more mature, comparison studies often give w a y to measurement because of the existence of better absolute scales against w h i c h to measure phenomena. L o n g ago a foot racer w o u l d race only against another runner,

52

The Process of Social-Science

Research

and the better and poorer runners w o u l d be established by comparison. N o w a runner also races against a clock and achieves a time record. E v e n w i t h our excellent contemporary timepieces, however, a time record does not always contain as m u c h or more information than does a compara­ tive result. A w o m a n may prove she is a very fast runner b y beating other fast runners, even though the recorded time is slow, because the race is h e l d on a slow track and against the w i n d . Comparative times of race horses on different tracks and against different competition are w i d e l y recognized b y racing bujffs as inconclusive evidence i n handicapping a race. A comparison may be quantitative like "20 percent faster than . . . b u t often the results cannot be expressed any more precisely than "more" or "less" or "equal to." W i t h t w o paintings y o u are not likely to go further than asking whether a person likes one of them better ( a l t h o u g h you might get more precision b y asking whether the interviewee likes p a i n t i n g A ' m u c h more than" or "a little more than" p a i n t i n g B ) . The proportion or percentage is the basic descriptive statistic for compari­ son problems. For measurement problems there is a common standard, and therefore the data can be expressed as absolute numbers like "ten inches," "two hours," or "2,150 spectators." B u t the percentage makes possible quantitative comparisons between t w o or more quantities, even w h e n the absolute value of neither is commonly k n o w n . The percentage expresses, for example, relative length or popularity, "57 percent as long" or "two-thirds as many spectators."

6. R e s e a r c h T h a t T r i e s to F i n d Relationships The types of research problems we have discussed i n previous sectionsdescription, classification, measurement, and comparison—are intended to reveal w h a t phenomena are; more broadly, all of these problems are de­ scriptive. Description and measurement studies describe many or a few aspects of one event or one set of events taken as a single entity. Compari­ son problems describe t w o (or several) things or groups w i t h reference to each other, and classification studies create devices for more accurate and meaningful descriptions. N o w w e shall discuss the first of t w o types of problems i n w h i c h we ask h o w events are related to one another. The second type, cause-and-effect problems, is really a subclass of relationship problems. A few examples should make clear the nature of relationship problems: H o w w e l l are I . Q . test scores related to future success i n school? T h a t is, h o w w e l l do I.Q. scores predict future school success? H o w w e l l does the behavior of groundhogs on Groundhog D a y predict the end of winter? H o w closely related are income and other indexes of social class? D o rises and falls i n the economy follow rises and falls i n the stock market? W e shall defer u n t i l the next chapter consideration of such problems as whether smoking causes l u n g cancer and whether stock-market movements cause movements i n the economy.

Types of Empirical Research

53

The investigation of relationships and predictions touches ticklish philosophical arguments like those about the nature of i n d u c t i o n and whether i t is possible to generalize. W e shall, however, sidestep some philo­ sophical arguments and postpone some others u n t i l Chapter 32. A n investigation into whether there is a relationship between t w o oc­ currences or variables is an attempt to find out whether t w o ( o r m o r e ) phenomena are part of the same scheme of things, that is, whether they are closely associated w i t h each other i n nature's cobweb. The cobweb analogy is instructive. I f t w o particles are entrapped close to each other i n a cobweb, and i f one of them moves, the other w i l l move i n close agreement w i t h i t . But i f the particles are m u c h farther from each other, movement i n one w i l l not be as closely accompanied b y movement i n the other. Furthermore, notice that movement i n particle A and i n particle B can be related even i f neither A nor B b u t rather C initiates the motion. Of course i t is true that everything is related to everything else to some extent. I f y o u drop a stone to the g r o u n d i n Illinois, theoretically there w i l l be an impact i n Chile, Ghana, and every other part of the w o r l d . B u t the impact i n Chile is so slight that we can ignore i t , and no instrument w i l l be sensitive enough to record i t . I n a psychological context, i t is surely true that i f some environmental factor causes a change i n one aspect of a per­ sonality, a l l other aspects of the personality w i l l be affected to some degree. A n d i f some economic shock i n Guatemala causes an inflation i n the quetzal there w i l l be some related movement i n the value of the dollar. Neverthe­ less, relationships like those i n the examples given are so insignificant that we ignore them. W e shall confine our interest to important relationships between events, relationships that matter. A n d of course relationships must be large enough to be detected b y the crude instruments at our disposal. ( A n atomic explosion or earthquake i n Chile can probably be detected i n Illinois, t h o u g h the fall of a stone cannot.) There are several types of relationships that are not the simple cause-andeffect relationships. A m o n g them are these: A third phenomenon, C, may cause both A and B, accounting for the apparent relationship between them. Yet A and B vary together, and we therefore say that there is a relationship between them; to illustrate, tax changes w i l l affect b o t h the stock market and gross national product. Or A may cause B ( o r B may cause A ) , b u t we may not be able to establish w h i c h causes w h i c h . Nevertheless, we may be able to establish the extent of the relationship between them. D o you do w e l l i n a course because you like the course, or vice versa? Probably both are true. Does a young boy fall i n love because a g i r l asks h i m to go steady, or vice versa? Or i t may be that A is a p a r t i a l cause of B and B is a p a r t i a l cause of A . One such type of interrelationship is "feedback." I h i t you, w h i c h causes y o u to h i t me. The b l o w to me is feedback from m y b l o w to you. The goingsteady and doing-well-in-course examples m i g h t also be examples of feed­ back. There can be great value i n k n o w i n g w h a t relationships exist even i f w e

54

The Process of Social-Science

Research

do not k n o w their nature i n terms of cause and effect. Here are some of the uses to w h i c h such knowledge can be p u t : First, one phenomenon may be used as a predictor of another phenome­ non of interest. I f w e believe that the stock market's ups and downs occur six to twelve months earlier than ups and downs occur i n the economy, w e can then predict w h a t the economy is likely to do six to twelve months hence. W e can take action based on this prediction to t r y to change the behavior of the economy. T h e fhght of birds close overhead may usually predict rain, even though the flying birds do not cause the rain; w h e n we see the birds, we p u t up the shutters to avoid being drenched b y the oncoming rain. Second, one phenomenon may serve as a proxy measurement of the other. A n advertising researcher may believe that the amount of readership of advertisements i n magazines is closely related to the effect of the advertise­ ment i n creating sales. I f the situation makes i t difficult or impossible to measure the sales effect of the advertisement directly, the researcher may then measure the amount of readership the advertisement gets and use this measurement as an indirect index of the sales the advertisement produces. Third, the relationship between t w o proxy measurements may be of i n ­ terest. I n the previous paragraph, we discussed the relationship between sales—which is really w h a t the advertiser is interested in—and readership, w h i c h the advertiser m i g h t wish to use as a proxy index of sales. B u t the advertiser m i g h t also be interested i n the relationship between t w o separate proxy indexes of sales—perhaps readership and the number of coupons that are c l i p p e d and returned. Neither readership nor coupon response is the "real" cause or the "real" effect of sales. B u t an investigation of the relationship between the t w o indexes can tell the advertiser whether the firm can reasonably assume that both indexes can stand for the same thing. I f readership varies w i t h sales, and coupon response varies w i t h readership, the advertiser can expect coupon response to vary w i t h sales. Educators and educational researchers often look at the relationship of one aptitude test to another aptitude test. I f b o t h tests give high and l o w scores to the same people, then the cheaper and simpler test to administer can be used. Similarly, i f the Dow-Jones index of t h i r t y industrial stocks is closely related to the composite of all industrial stocks on the N e w York Stock Exchange, there is no need to compute the more complete index. Intelligence tests pose a curious question about indexes. Should we say that a score on an L Q . test is intelligence? This seems to be w h a t people mean w h e n they say "He has a h i g h I . Q . " Alternatively, we can t h i n k of an I . Q . score as simply a somew^hat inacciuate index of capabilities that b r i n g about success i n the future. For most purposes, i t does not matter w h i c h w a y we t h i n k of the I . Q . B u t confusion becomes apparent w h e n we b e g i n to say about someone "He is very b r i g h t even though his test scores don't show i t . " I f w e i n t e i p r e t the I.Q. score as the same thing as intelligence, then the statement w o u l d make no sense.

Types of Empirical Research

55

T h e strategy of research i n looking for a relationship is to examine the patterns of variations i n the t w o phenomena you seek to relate. I f A gener­ ally goes u p w h e n B goes up, then A and B are related to some extent. Or, for that matter, i f A generally goes down w h e n B goes up, they are also related b u t inversely. A basic w a y to examine the variations and test for the presence of a relationship is to arrange the data i n some form like a table or a graph.'* For example, i f w e w a n t to find out whether or not there is a relationship between a man's income and w h i c h p o l i t i c a l party he votes for i n a small t o w n , we m i g h t record the data from our sample of 200 i n this way, i n a t w o row, t w o - c o l u m n ( " t w o - b y - t w o " ) table, as i n Table 4.1. TABLE 4.1 Republican Below $10,000 Above $10,000 Subtotals

Cell A = 35 (17.5%) Cell C =^ 47 (23.5%) 82

Democrat Cell B = 68 (34%) Cell D = 50 (25%) 118

Subtotals 103 97

I f all the observed people had fallen into cells A and D , i t w o u l d be obvious that there is a relationship. Similarly, i f A and B had equal numbers of people and i f C and D also h a d equal numbers of people, then i t w o u l d be equally obvious that there is not a relationship. B u t i n the social sciences we are rarely so lucky as to have such clear-cut results. Almost i n v a r i a b l y the data w i l l show a mixed pattern like that i n the table. W e therefore have some diflBculty i n deciding whether there is or is not a relationship between income and p o l i t i c a l p a r t y and, i f a relationship does exist, h o w "strong" i t is. Casting tables into percentages (see the numbers w i t h i n parentheses i n the example) often clarifies whether there is a relationship. B u t i t is always a question w h i c h percentages should be computed. I n this case, each cell is shown as a percentage of the whole. I f there were no relationship between income and political party, the ratio of the percentages i n cells A and B w o u l d equal the ratio of the percentages i n C and D ; the same w o u l d be true of cells A and C compared to B and D . T h a t is, percentage A d i v i d e d by percentage B w o u l d equal percentage C d i v i d e d b y percentage D , and percentage B d i v i d e d b y percentage C w o u l d equal percentage B d i v i d e d b y percentage D . T h a t these ratios are not equal suggests a relationship between income and poHtical party, though we must later check whether such a pattern m i g h t be caused b y chance. Statistical theory can help us to d r a w sound inferences, as we shall see i n Chapters 25-30. One common statistical w a y to measure the extent o f a relationship is a correlation coefficient, w h i c h can be used i n place of—or, better yet, i n a d d i t i o n to—tables and graphs. I t w i l l be discussed i n Chap­ ter 30. 4. Consult H. Zeisel (1957) for a clear discussion of how to arrange data in tables and graphs.

56

The Process of Social-Science

Research

7. F i n d i n g C a u s e s a n d Eflfects As we have mentioned, cause-and-effect relationships are a subclass of scientific relationships i n general; I shall substantiate this claim i n Chapter 32. T o say hoio cause-and-effect relationships differ from other relationships or, to p u t i t another way, to create a definition of "cause-and-effect relation­ ship" is a difficult matter, indeed, and that is the job of Chapter 32. D e c i d i n g whether to call a particular relationship between t w o variables, say A and B, a "causal" relationship is sometimes a straightforward matter. I f the observed relationship is the result of an experiment, there is usually little argument about saying that an observed relationship is a causal rela­ tionship, i m p l y i n g that the artificially manipulated independent variable is the cause and the dependent variable whose change is observed the effect. Indeed, one of the best ways to reduce confusion about whether an ob­ served nonexperimental relationship is causal is to subject i t to experiment. As we shall see i n more detail later, however, an experimental relationship may not deserve to be called a "causal" relationship, usually because there is something w r o n g w i t h the way that the experimenter has "specified" the independent variable. Consider, for example, the fabled gentlemen w h o got experimentally drunk on bourbon and soda on M o n d a y night, Scotch and soda on Tuesday night, and brandy and soda on Wednesday night—and stayed sober Thursday n i g h t by d r i n k i n g nothing. W i t h a vast inductive leap of scientific imagination, they treated their experience as an empirical demonstration that soda, the common element each evening, was the cause of the inebriated state they h a d experienced. Observed relationships that do not spring from controlled experiments are m u c h harder to characterize as causal or noncausal. Various practical de­ vices can, however, assist i n safe classification of a relationship as causal. Here is one such device: I f y o u observe a relationship between A and B and if y o u can estabUsh that A d i d not cause B, the likelihood that B caused A is then greater. For example, years i n w h i c h the corn price is h i g h are f o l l o w e d b y years i n w h i c h large amounts of corn are g r o w n . W e can be quite sure that the h i g h price is not caused by the large supplies because the h i g h price occurs i n the previous years; causes usually ( b u t not always!; see the post­ script to Chapter 32) precede the effects. ( B u t this relationship by itself is not sufficient to establish causality from B to A ; C m i g h t cause b o t h . ) Another device that increases confidence that a relationship is causal is cross-classification analysis w i t h tables. I f y o u t r y out many of the most likely additional variables i n a cross-classification and i f the original ob­ served relationship between your variables is not affected, y o u have made a more convincing case that your observed relationship is causal. Examples of cross-classification are given i n Chapters 25 and 26. Some scientists react to the difficulties of establishing cause and effect by w i t h d r a w i n g into their shells and refusing to say that the relationships they find are a n y t h i n g more than correlations. (Sometimes this is a safety play to

Types of Empirical Research

57

avoid possible criticism, as w i t h some scientists' findings of relationships be­ tween smoking and l u n g cancer.) B u t decision makers cannot avoid m a k i n g judgments about cause and effect, even i f they wish to weasel out of i t . I f smoking does not cause cancer, there is no p o i n t i n t r y i n g to stop people from smoking. I f the stock market does not cause changes i n the economy, there is not m u c h p o i n t i n t r y i n g to control the action of the stock market as an aid to controlHng the economy. Therefore, i t is often the decision makers who frame research problems n a r r o w l y i n terms of cause and effect. W h a t causes juvenile delinquency? H o w can I increase the readership of m y f u r n i ­ ture advertisements? H o w does a longer school year affect a child's educa­ tion? D o television dramas cause violence i n children? The decision maker wants to k n o w w h a t to change so that he can achieve the effect he wants. The decision whether to call a relationship "causal" is indeed a dehcate matter. Statistical techniques alone cannot guarantee that an observed nonexperimental relationship is causal. Nor can statistics or other formal rea­ soning methods guarantee that even an experimental relationship is causal. Statistics w o u l d not reveal the flaw i n the reasoning that soda causes i n ­ ebriation. The only protection that the researcher can give himself is to saturate himself i n the complicated and detailed richness of the phenome­ non he is w o r k i n g on. A n hour's elbow bending and casual talk about liquor at a neighborhood bar w i l l save y o u forever from believing that soda is the active ingredient.

8. M a p p i n g Structures This last type of research p r o b l e m w i l l be discussed only briefly. Examples include finding the kinship structure of a group, that is, w h o is related to w h o m i n w h a t ways; determining the structure of an u n w r i t t e n language; and m a p p i n g an economy. Structure ( o r system) m a p p i n g is a sort of de­ scription, b u t i t is more h i g h l y organized than ordinary description. I t is a good deal like p u t t i n g together a jigsaw puzzle; first, one tries to find a second piece to fit a given piece, then a t h i r d piece to fit the first t w o , and so on u n t i l the whole system falls into place. Unlike exploratory descriptions, structure m a p p i n g begins w i t h a conceptual structure that the investigator tries to fill i n . One knows i n advance w h a t one is looking for; for example, the linguist tackles an u n k n o w n language k n o w i n g that i t must have phonemes, morphemes, nouns, verbs, and so on. His job is to identify the various members of the classes and the relationships among them. T o p u t i t another way, the structure mapper starts off k n o w i n g the generalities of the language; his job is to fill i n the specifics. Taken as a whole, structure m a p p i n g encompasses many other types of research, especially classification and comparison—for example, comparing two words that sound similar to see whether a speaker of the language distinguishes between them. M a p p i n g a system is a sequence of trial, obser-

58

The Process of Social-Science

Research

vation of the result, deduction of a new hypothesis, trial of the new hypothesis, and so forth. Structure m a p p i n g is i m p o r t a n t i n economics. For example, an input-out­ p u t analysis is a map of the flows of materials to and from each segment of an economy—where nails from nail factories go and where r a w materials for nails come from. But, even t h o u g h the ultimate objective of most economists is to understand the economic system as a whole, most economic studies deal w i t h single relationships i n particular sectors of the economy. Simplifi­ cation and abstraction of this k i n d are often necessary i n science because systems are often too complex to study as wholes. D e d u c t i o n usually plays a large p a r t i n structure m a p p i n g . For example, w h e n the linguist identifies one w o r d as a noun and then hears another w o r d used directly after the noun, he deduces that the second w o r d is not a noun also, because one noun seldom follows another. A n d i n economic i n p u t output analysis one can deduce useful facts about the inputs of nails to various industries i f one knows the total output of the nail industry. The w o r k of structure m a p p i n g re-enacts i n microcosm the activities of the various disciplines taken as wholes. W h o l e series of studies follow the same pattern as do the i n d i v i d u a l steps i n structure m a p p i n g . A n example is a series of related animal experiments i n the psychology of learning. The studies form a sequence of experiments, observations, deductions, further hypotheses, and so on. The difference is that i n psychology the u n i t of w o r k is defined as a single experiment, whereas i n kinship studies or field linguis­ tics the u n i t is defined as the m a p p i n g of the entire system.

9. E v a l u a t i o n R e s e a r c h Evaluation research is not another sort of research different from the types of research described above—despite some recent claims to the contrary. Fad, fashion, and catchwordism r u n w i l d i n science, just as elsewhere i n society, especially w h e n money is to be made b y being " w i t h i t . " A n d w i t h recent demands by the holders of grant pursestrings that social science be "relevant" and "responsive to current social needs," there emerged the label "evaluation research," w h i c h seems super relevant and responsive. Every comparison study i n the history of the study of learning is an "evaluation" of one method of learning or teaching compared to another. Every economic study of the effect of a m i n i m u m - w a g e l a w on employment and earnings of poor people is evaluation research, evaluating the effect of such legislation. Every medical and anthropological study tracing the i m ­ pact of Western culture on an isolated tribe's physical and mental hygiene is evaluation research. I n short, many comparison, measurement, and causeand-effect studies y i e l d explicit evaluations, and many of the rest y i e l d i m p l i c i t evaluations; there does not exist a distinct k i n d of research k n o w n as evaluation research. M u c h research could be made m u c h more valuable, however, i f the re-

Types of Empirical Research

59

searcher w o u l d aim to produce results )delding clear-cut evaluations that can be used to improve social judgments. B u t this is a distinction between a) sound, well-designed research and b ) ill-conceived, sloppy work, rather than between "evaluation research" and other research. I n concluding this chapter, I w a n t to repeat that there are several types of research problems and that, furthermore, there ought to be several types of research problems. Unfortunately, too many scientists i n various disciplines assume that the type of p r o b l e m they attack is a l l there is and a l l there should be to scientific research. Some economists think and act as i f deduc­ tion and statistical analysis of time series is the beginning and end of social science. Some psychologists claim that all scientific research is and should be a process of testing hypotheses b y experiments. The danger of such narrow claims is that they lead to stereotyped choices of research methods, w h i c h b l i n d the researcher and cut d o w n his effectiveness. I t seems to me that the demand that all research be thrust into any one m o l d , hypothesis testing for instance, is an example of a common and de­ structive tendency i n science—overstating the merits or generality of a theory or method, or "intellectual imperiaHsm" as I choose to call i t . This imperial­ ism results i n bitter arguments about this theory or that theory, rather than agreement that each theory ( a n d m e t h o d ) may do some jobs better than the other theory ( o r m e t h o d ) . The purpose of this discussion of the various types of research problems has been to lead y o u to ask yourself: " W h a t type of research p r o b l e m is the question that I am t r y i n g to investigate? I n t o w h i c h class does i t fall?" This approach helps one to understand the nature of a research p r o b l e m and h o w to go about i t . 10. S u m m a r y Each social-science discipline uses some method almost to the exclusion of others. For example, psychology speciaHzes i n experiments, sociology i n surveys, and economics i n statistical analysis of government-produced data. However, other methods sometimes can be more appropriate. T h e task of this chapter is to describe the various types of empirical research so that y o u may t h i n k of t h e m w h e n y o u need them. Descriptive research is especially called for w h e n a field is first opening, to define the problems and to produce clues for further research to follow up. Classification research makes distinctions among the phenomena under investigation. This may lead to causal explanations of the differences among the phenomena or of the relationships among them. Or the classification may serve as a map of the territory to orient subsequent researchers. Measurement (estimation) research is a quantitative, and therefore more precise, form of descriptive research, b u t i t is usually less flexible and rich i n variety than is qualitative descriptive research. The most common mea-

60

The Process of Social-Science

Research

surements are: counting, central value, p r o p o r t i o n , d i s t r i b u t i o n , a n d v a r i ­ ability. research

Comparison

is done

to

find

out w h i c h

alternative is bigger,

faster, or perhaps better. research attempts to determine w h e t h e r there is an associa­

Relationship tion

between

t w o phenomena.

This

may

be

to

aid prediction,

or

to

determine w h e t h e r one variable can be used as a p r o x y for the other, or i t may be a prelude to d e t e r m i n i n g a causal relationship. Cause-and-effect

research can be done w i t h experiments or surveys; best

of a l l is a c o m b i n a t i o n of methods, together w i t h good j u d g m e n t . Cause-andeffect research

attempts to go b e y o n d the existence of an association to

determine i f one of the variables can plausibly be said to cause the other. Structure

mapping

is

research

that

investigates

an

entire

structure,

w h e t h e r i t be a k i n s h i p g r o u p or an economic system. I t encompasses other sorts of research as w e l l .

EXERCISES 1. W i t h i n o n e g i v e n f i e l d , f o r e x a m p l e , p s y c h o l o g y , e c o n o m i c s , o r a n t h r o p o l ­ ogy, find e x a m p l e s of each of t h e seven types of r e s e a r c h — d e s c r i p t i o n , classification, measurement a n d estimation, c o m p a r i s o n , search f o r rela­ tionships, cause-and-effect, a n d mapping structures. Some rummaging a r o u n d in the literature will b e necessary t o find all seven types. 2. If t h e c l a s s is d o i n g i n d i v i d u a l r e s e a r c h p r o j e c t s , c l a s s i f y e a c h p r o j e c t b y t y p e of r e s e a r c h .

student's

3. R e a d t h r o u g h t h e f i r s t f i v e empirical a r t i c l e s in a p r o f e s s i o n a l j o u r n a l o r in a b o o k o f r e a d i n g s o f e m p i r i c a l r e s e a r c h , a n d c l a s s i f y t h e s t u d i e s b y t y p e of r e s e a r c h . 4. S h o w h o w a p a r t i c u l a r d i s t r i b u t i o n o f d a t a w o u l d b e s u m m a r i z e d o n e w a y for one purpose and another way for another purpose.

ADDITIONAL

READING

FOR

CHAPTER

4

On exploratory and descriptive studies, s e e Selltiz (Chapter 4). For a fascinating firsthand a c c o u n t of participant-observation descriptive s t u d y , s e e t h e A p p e n d i x t o W h y t e ' s Street Corner Society. More generally, see Denzin (Chapter 9). D e s c r i p t i v e r e s e a r c h in a n t h r o p o l o g y — t h a t i s , f i e l d m e t h o d s — i s c o v e r e d w e l l in W i l l i a m s ' s h o r t p a m p h l e t . Coleman discusses evaluation research for policy purposes, including a dis­ c u s s i o n o f " s o c i a l a u d i t s . " C a m p b e l l (1969) p r e s e n t s r e s e a r c h d e s i g n s t h a t are particularly appropriate for the evaluation of policy changes. T a n u r et ah c o n t a i n s a w i d e v a r i e t y o f s t u d i e s s h o w i n g h o w q u a n t i t a t i v e s t a t i s -

Types of Empirical Research

61

tical m e t h o d s can be used for measurement, estimation, c o m p a r i s o n s t u d i e s , a n d t h e f i n d i n g of r e l a t i o n s h i p s . T h o u g h t h e b o o k ' s s u b j e c t is s t a t i s t i c s , n o p r e v i o u s k n o w l e d g e of s t a t i s t i c s is r e q u i r e d , a n d t h e s t u d i e s a r e a p l e a s u r e a n d a j o y t o r e a d . T h e y c o v e r a w i d e v a r i e t y of a r e a s f r o m a n t h r o p o l o g y t o b u s i n e s s , a n d f r o m p u r e to a p p l i e d r e s e a r c h . T h e s t u d i e s include both the social sciences and the biological sciences, and some n a t u r a l s c i e n c e s as w e l l . V e r y h i g h l y r e c o m m e n d e d .

5 theory, model, hypothesis, and empirical r e s e a r c h 1. 2. 3. 4. 5.

What Is Theory? Models, Theory, and Hypotheses Two Views of Theory and of Science in General The Relationship Between Theory and Empirical Research Summary

The purpose of this chapter is to t h r o w some light on the confusing relation­ ships among theory, theorizing, models, hypotheses, and empirical research. The aim of all research—theoretical and empirical—is to get new k n o w l ­ edge. But we can distinguish among quite different types of knowledge that one may be seeking. One may seek better understanding of the social w o r l d , that is, a better explanation of some h u m a n phenomenon such as racial discrimination or psychological depression. Or one may seek to find the best way to deal w i t h a given class of situations, say, economic recession or psychological depression. Or one may seek to evaluate the effects of a pro­ gram, say, school busing or psychotherapeutic treatment. There is some— b u t only some—connection between this variety of aims and the nature of various research problems described i n the previous chapter. The roles of theory and empirical research, and the relationships between them, differ w i t h the types of knowledge one seeks. Here are a few examples to illustrate possible relationships between theory and empirical research: 1) A n urban planner may w a n t to k n o w whether the prospective dwellers i n a new area w i l l support a shopping mall. The question comes out of the planner's background knowledge. Other than that, theory is n i l . E m p i r i c a l research is the m a i n tool, and its results are the knowledge sought. 2 ) Malthus deduced from basic principles of economic theory the hypothesis (conjecture) that additional births cause the standard of l i v i n g to fall. For many years that hypothesis—accepted as part of economic theory—was

Theory, Model, Hypothesis, and Empirical Research

63

regarded as knowledge i n itself, w i t h o u t supporting empirical research. 3 ) I n recent years researchers have looked at the data on b i r t h rates and stan­ dards of l i v i n g for various countries, to see i f the relationship between b i r t h rate and economic g r o w t h rate is consistent w i t h Malthus' hypothesis. T h e i r aim has been t w o - f o l d : to elaborate on Malthus' theorizing to produce greater knowledge of the effect of births u p o n the economy, and to test Malthus' theoretical deduction. T h e i r aim was not, however, to challenge the body of economic theory from w h i c h M a l t h u s ' deduction was d r a w n . W h e n the data t u r n e d out not to support M a l t h u s ' theorizing, three possible conclusions were suggested: E i t h e r the M a l t h u s i a n theory is insufficient and must be i m p r o v e d , or the data and its analysis are not good enough, or both.

1. W h a t I s T h e o r y ? E m p i r i c a l research was easy to define ( see pages 5-6 ) . B u t theory is harder to nail d o w n , p a r t l y because the concept has very different meanings i n different disciplines and even at different moments i n particular disciplines. I n fields such as economics and physics, where there are well-established assumptions and an apparatus for m a k i n g systematic deductions, then there is said to be a body of theory. The theory must cover a substantial p o r t i o n of the material i n a field or subfield, and i t must be systematically organized, or else one should not say that there is a body of theory i n a discipline. To p u t it another way, there is no theory unless i t is a body of theory.^ The deductions from the theory need not be correct for the theory to claim the title of "theory," b u t people are less likely to honor the claim of a theory that often produces crackbrained hypotheses. Another requirement of a body of theory is that the same assumptions and the same type of deductive apparatus must be used for many of the problems i n the field. One could w o r k out a set of assumptions from w h i c h one could deduce any single hypothesis. B u t such a set of assumptions w o u l d not be enough to constitute a theory; a set of assumptions must support not just the one hypothesis b u t many other hypotheses also. The assumptions that make up microeconomics, for example, underlie an enor­ mous b o d y of economic analysis. I n other fields where phenomena cover a w i d e r range of behavior than does economics—sociology, psychology, and anthropology among them—it has not yet been possible to develop an integrated body of propositions, w h i c h most writers accept as the basic u n d e r p i n n i n g for the field as a whole. I n such fields the t e r m "theory" has a looser meaning, and may refer to 1. W. Letwin makes this argument very forcefully in his discussion of theory in eco­ nomics; G. Homans makes a similar point about sociological theory (pp. 11-12). And this point of view is well accepted in the physical sciences. Unfortunately, however, it is far from universal among social scientists; most social scientists continue to use the terms "theory" and "hypothesis" interchangeably. Such usage leaves no room for the important distinction between statements that are logically related to other statements within a deductive system, and statements that are not.

64

The Process of Social-Science

Research

almost any speculative t h i n k i n g offered as an explanation for some phenomena. The key element of theory is that i t abstracts a few characteristics of reality i n an attempt to isolate and describe its central features. The rational profit-making firm i n economic theory is such an abstraction. Everyone knows that no organization is perfectly rational or perfectly single-minded i n the pursuit of a single goal. B u t the microeconomic theory w h i c h is b u i l t u p o n this abstraction, together w i t h other abstractions such as perfect competition and complete information, is useful and hence is retained and used despite the fact that the theory departs from complete realism w h e n i t abstracts and focuses on a few key elements. Theory can be wise i n its choice of a key element or elements to focus on, or the theorizing can be foolish and wrong-headed. The test is whether the elements of the theory y i e l d hypotheses that are important, reasonable, and relevant to one's interests. This is not the same as being r i g h t or w r o n g , however—some wise t h i n k i n g can t u r n out to be w r o n g on the facts; yet the theorizing was w o r t h w h i l e because i t led us to learn something of value that we d i d not k n o w before. For example, either a sociologist-demographer or an economist-demogra­ pher interested i n fertility may observe that more education among w o m e n is the strongest and most reliable factor associated w i t h lower fertility. A n d either the sociologist or the demographer, starting w i t h the basics of that field, can arrive at a reasonably convincing theoretical explanation of w h y more education among w o m e n causes lower fertility. Obviously there are a great many influences other than education—psy­ chological, economic, cultural—that determine whether a given family or c o m m u n i t y has more or less children. T o fully describe i n all its richness even one family's process of m a k i n g a b i r t h decision w o u l d require m u c h time and many words. Theoretical speculation abstracts ( i n this case) the single element of the women's education to examine as a c o n t r i b u t i n g ex­ planation of the decision to have another child. A n d empirical investigation of this phenomenon w i l l also abstract to women's education alone, or to women's education plus a few other variables, as the explanation of fertility. ( T h e additional variables, however, are likely to reflect whether the re­ searcher is a sociologist, psychologist, or economist.)

2. Models, T h e o r y , a n d Hypotheses Some fields and some subfields of any given discipline, are not ready to b u i l d an all-embracing body of theory. B u t imaginative scientists neverthe­ less p u t together sets of abstract propositions from w h i c h one can deduce some hypotheses. Such a set of propositions that is relevant to one corner of a field or to a few related phenomena is usually called a model. A model is like a mini-theory. I t has the same basic nature as a theory because i t focuses on a few elements abstracted from all of reality. A n d the terms "model" and "theory" are frequently used interchangeably.

Theory, Model, Hypothesis, and Empirical

Research

65

3. T w o V i e w s of T h e o r y a n d of Science in G e n e r a l There are t w o views of the universe that lead to t w o views about theo­ rizing—and of science i n general. One may view the w o r l d as a system having inherent order, and the task of the scientist being the discovery of the "true" propositions and relationships i n that system. For example, one may t h i n k that the speed w i t h w h i c h a body falls due to gravity is one of the u n d e r l y i n g propositions that characterize our w o r l d , and i f w e diligently seek after such propositions and relationships we can discover them all and then have a complete understanding of the universe. To p u t this v i e w an­ other way, at the beginning the universe was created or evolved according to a set of equations, and i t is our job to discern the equations. The other view—the one w h i c h I find more helpful—is that the universe is not perfectly formless or chaotic—though i t may once have been—but that the regularities and generalizations we discover result from our i n ­ terests and perceptions as w e l l as from the features of the w o r l d . That is, we invent and develop the relationships we find, rather than merely discovering them. For example, w h a t shape is the earth? I t is r o u n d and smooth to a firm that manufactures cheap globes, b u t is b u m p y to a manufacturer of more expensive globes. T o a surveyor or farmer, i t is flat. Its circumference is greatest at the equator, for some persons. To an aviator, i t is an unsmooth, u n r o u n d set of u p c r o p p i n g mountains. A n d so on. The earth does not have one shape b u t many, and the relevant "model"—fiat, round, bumpy—de­ pends upon your needs and interests. No "model" of the earth captures all its features—because then i t w o u l d not be an abstracted model, b u t the earth itself. As Georgescu-Roegen p u t i t : [A]ctuality is a seamless whole we can slice . . . wherever we may please. And, Plato to the contrary, actuality has no joints to guide a carver. . . . Only our particular purpose in each case can guide us in drawing the boundary of a process. So, every scientist slices actuality in the way that suits best his own objective—an operation that cannot be performed without some intimate knowl­ edge of the corresponding phenomenal domain. . . . No analytical boundary, no analytical process . . . (p. 3 ) Just so i t is w i t h the familial process of m a k i n g decisions about whether to have another child; no single statistical model of i t is the correct one. Rather, different models of the fertility decision w i l l be appropriate for answer­ i n g different scientific questions, and w i t h respect to different scien­ tific purposes.

4. T h e Relationship B e t w e e n T h e o r y a n d E m p i r i c a l R e s e a r c h N o w we tackle the tangled relationship between theory and empirical re­ search. I n applied research the relationship between hypothesis (there is seldom anything that can be called theory) and empirical research is so close as to be obvious. Consider the research firm that polls the electorate to

66

The Process of Social-Science

Research

predict the w i n n e r of the presidential election. The relevant speculations are mostly obvious assumptions: the election w i l l really be held, the candidates w i l l stay alive, and so on. T h a t is, there is no real theorizing involved, and the empirical research is the entire scientific task. I n "evaluation" research—such as an evaluation of whether busing affects the attainment and socialization of students—the empirical research also often seems to proceed w i t h o u t theory. But i t is w o r t h n o t i n g that the reason there is a busing phenomenon to investigate is because of the specu­ lation some years back that school integration would affect students' attain­ ments and sociaHzation. I n "pure" research that seeks to explain the h u m a n w o r l d i n general scientific terms, the relationship between theory and research is more com­ plex and difficult. Theoretical speculation and empirical research are t w o ap­ proaches to the knowledge one seeks. A n d the use of t w o very different approaches together is m u c h more powerful than one approach alone. I f the two approaches give the same answer, then one can feel m u c h more secure w i t h the conclusion than w i t h only one approach. A n d i f the t w o approaches do not y i e l d the same answer, y o u are alerted that the matter is not so settled and straightforward as one approach alone w o u l d suggest. This is true for the combination of research and empirical research just as i t is for two separate empirical methods. The central task i n using theoretical speculation and empirical research together is to m o l d them and sharpen them so that b o t h are addressing the same question, i n such manner that their results can reasonably be com­ pared. I n this respect the relationship between theory and a given piece of empirical research is no different than the relationship between t w o pieces of empirical research bearing on a given p r o b l e m : The t w o research ap­ proaches must be made to deal w i t h the same phenomena, or else their results cannot be compared. "Operationalization" of the theoretical concepts is the task of finding appropriate empirical proxies for the theoretical variables. "Conceptualiza­ t i o n " is the complementary process of finding appropriate theoretical con­ structs for interesting empirical patterns that t u r n up. Operationalization and conceptualization w o r k hand-in-hand as the research w o r k progresses. Very often the researcher passes back and forth from conceptualization to operationalization, rather than the one-way flow from theory to empirical w o r k envisioned b y the philosophers of science. A key Hnk i n the struggle to have the theoretical statements and the empirical w o r k deal w i t h the same phenomena is sound definition of the variables. O n the theory side, there is no special technique except general clear t h i n k i n g to help define terms clearly. B u t on the empirical side, the operational definition is a powerful tool i n w o r k i n g t o w a r d definitions that all researchers can understand and that therefore can be compared against the theoretical terms. This topic is dealt w i t h i n Chapter 2, especially pages 12-17.

Theory, Model, Hypothesis, and Empirical Research

67

A n example of the connection of theory—assumption, deduction, and hypothesis—with empirical research whose purpose is to test the theory was given on page 34. Here is another example, again taken from economics: a.

E X A M P L E 1:

FACTUAL

ADVERTISING RATES I N NEWSPAPERS:

PROBLEM

I t is an observed fact that newspapers charge lower advertising rates to local retailers than to nationally advertised brands of goods. T o explain w h y they do so is a research problem. Assumptions. W e assume, first, that businessmen (newspaper owners, i n this case) w i l l charge that price to each group of people that w i l l result i n maximum profit. ( This is the "economic man" assumption, the same as the first assumption i n the example on p. 34. ) Second, we assume that businessmen knoio h o w groups of customers (retailers and national advertisers) react to various prices. ( T h i s is the "perfect knowledge" assumption, the same as the second assumption on p. 34.) Deduction. W e deduce that i f one customer group is less sensitive to a price increase than is another group, i t w i l l be profitable to charge a higher price to the less sensitive customer. This can be shown w i t h a standard logical chain of economic deduction. Empirical Test. There are many possible ways to test this deduction empirically. One could, for example, t r y to find out whether local advertisers really are more sensitive to a price increase than are national advertisers. This could be investigated b y examining the changes i n the quantity of local and national advertising f o l l o w i n g i n d i v i d u a l price changes i n a sample of newspapers. Or, one could relate the aggregate amounts of local and na­ tional advertising to the average prices of local and national advertising over a period of years. Or, one m i g h t persuade one or more newspapers to con­ duct controlled experiments w i t h their advertising prices. A l l these methods examine changes i n economic data, and then reason back to the beliefs and behavior of presumably rational decision-makers. Another approach is to directly examine the beHefs of the newspaper executives. This is the m e t h o d w e shall consider, w i t h the hypothesis and related test as follows : Hypothesis. W e hypothesize that, i f the deduction is correct, the news­ paper publishers believe that national advertisers are less sensitive to price changes. Research Method. The hypothesis can be tested b y finding out w h a t the newspaper advertisers believe about the relative sensitivity of local and national advertisers. A questionnaire study found that publishers do indeed

68

The Process of Social-Science

Research

believe that national advertisers are less sensitive to price changes and thus confirmed the hypothesis (Simon, 1965d). b . LOGIC A N D T H E CONNECTION BETM^EEN THEORY A N D E M P I R I C A L WORK

Philosophers have devoted m u c h effort to analyzing the logic of the scientific process. They have o u t l i n e d h o w science begins w i t h a theoretical frame­ work, deduces propositions, and tests the propositions. Scientific publica­ tions tend to follow this format, w h i c h is also shown i n the examples on page 34. The actual development of a scientific project seldom follows this logic, however. Rather, y o u may begin w i t h some data, get an idea out of the data, scratch around for some relevant theory, then test the theoretical deduction, find that the theoretical deduction is not confirmed, do some theorizing, get some more data, get new ideas, and on and on. Or the scenario may begin w i t h a casual observation i n everyday life, move on to data or to theory, and so forth. The p o i n t is that the route to valuable scientific results tends to be cir­ cuitous, unprogrammed, non-logical, intuitive, repetitious, frustrating, sur­ prising, and hence exciting, as llasburg's map ( F i g u r e 5.1) illustrates. A last point about the connections between theory and empirical re­ search: They go b o t h ways. Just as theory supports empirical research, there also can be no science w i t h o u t empirical research to serve as a bridge between scientific thought and reality, as the source of speculation and as a test of hypotheses. E m p i r i c a l w o r k and hypothesizing shade into one an­ other. W h e n y o u look out the w i n d o w , observe rain, and announce "It's r a i n i n g outside," y o u are extrapolating that i t is raining all around the house and not just outside one w i n d o w . E m p i r i c a l and theoretical statements form a continuum. . . . [ N ] o observation is purely empirical—that is, free of any ideational element —as no theory (in science, at any rate) is purely ideational . . . the terms of even the barest description carry us beyond the here-and-now, if only because they must be capable of more than one utterance to have a usage. When I say, "This object is red," I am inescapably relating the present occasion to others i n which "red" is properly used. . . . When we see that someone is pleased or angry, we are relying on a whole framework of ideas about cultural patterns in the expression of emotion, just as we understand what is said not just on the basis of what we hear but also in terms of a whole grammar somehow brought to the hearing. (Kaplan, pp. 58-59) The relationship of theory to research is discussed from a related point of v i e w i n the f o l l o w i n g chapter.

5. S u m m a r y T h e relationship between theory and empirical research differs from situa­ tion to situation. Seldom does the pattern follow the philosophical-logical

Theory, Model, Hypothesis, and Empirical Research FIGURE

5.1

Source: Ernest Harburg, Ph.D. Reprinted by permission.

69

70

The Process of Social-Science

Research

m o d e l beginning w i t h a b o d y of theoretical axioms a n d proceeding t h r o u g h deduction to empirical testing. Rather, the process may begin at any p o i n t data, curiosity about an observation i n daily life, or theoretical deduction— and then i t flows back and forth i n a w e b of ideas and empirical testing and theoretical development. The place of theory differs from discipline to discipHne. I n some fields the b o d y of theory is strong and well-integrated, whereas i n other fields the best that one can hope for is a modest model to guide empirical study. A p p l i e d w o r k tends to proceed w i t h o u t theory as such, b u t rather on the bases of hunches and guesses relevant to a practical need for tested information.

ADDITIONAL

READING

FOR CHAPTER 5

S e l l t i z et al. ( C h a p t e r 2) c o v e r m a n y of t h e s a m e t o p i c s a s d o e s t h i s c h a p t e r . K a p l a n ' s The Conduct

of Inquiry

is a n e x c e l l e n t t r e a t i s e o n t h e p h i l o s o p h i c a l

b a s i s o f s o c i a l r e s e a r c h , a n d e s p e c i a l l y t h e r e l a t i o n s h i p of t h e o r y t o e m ­ pirical research. K u h n ' s t h e o r y of t h e d e v e l o p m e n t of s c i e n c e h a s b e e n v e r y i n f l u e n t i a l r e ­ c e n t l y , t h o u g h s o m e w h a t c o n t r o v e r s i a l . It is w o r t h g e t t i n g a c q u a i n t e d w i t h .

G choosing appropriate proxies for tlieoretical variabies 1. 2. 3. 4. 5. 6.

Dependent Variables Whose Referents Are Cleady Defined Dependent Variables Whose Referents Are Not Clearly Defined Choosing Independent Variables Choosing a Level of Aggregation Choosing a Level of Explanation Summary

W h e n the research project begins w i t h abstract theory, y o u must find one or more reasonable empirical proxies ( indicators ) for the theoretical variables. E v e n w h e n the research is "applied" rather than "pure," y o u must be sure that your empirical variables lend themselves w e l l to the empirical w o r k . This chapter discusses h o w to choose empirical variables. I t follows naturally from the discussion i n the previous chapter of the relationship of theory to empirical research and expands on i t . W h e n the theoretical structure is well-defined, the theoretical variable may be confined to a single dimension. B u t i n fields i n w h i c h the researcher works less w i t h a well-structured theory and more w i t h free-ranging imagination, the theoretical variable is likely to be multidimensional. Lazarsfeld describes this part of the transition from theoretical to empirical variables : Imagery. The flow of thought and analysis and work which ends up w i t h a measuring instrument usually begins with something which might be called imagery. Out of the analyst's immersion in all the detail of a theoretical problem, he creates a rather vague image or construct. The creative act may begin with the perception of many disparate phenomena as having some underlying char­ acteristic in common. Or the investigator may have observed certain regularities and is trying to account for them. I n any case, the concept, when first created, is some vaguely conceived entity that makes the observed relations meaningful.

72

The Process of Social-Science

Research

Suppose we want to study industrial firms. We naturally want to measure the management of the firm. What do we mean by management and managers? Is every foreman a manager? Somewhere the notion of management was started, within a man's writing or a man's experience. Someone noticed that, under the same conditions, sometimes a factory is well run and sometimes it is not well run. Something was being done to make men and materials more productive. This "something" was called management, and ever since students of industrial organization have tried to make this notion more concrete and precise. The same process happens in other fields. By now the development of in­ telligence tests has become a large industry. But the beginning of the idea of intelligence was that, if you look at little boys, some strike you as being alert and interesting and others as dull and uninteresting. This kind of general impression starts the wheels rolling for a measurement problem. Concept Specification. The next step is to take this original imagery and divide it into components. The concept is specified by an elaborate discussion of the phenomena out of which it emerged. We develop "aspects," "components," "dimensions," or similar specifications. They are sometimes derived logically from the over-all concept, or one aspect is deduced from another, or empirically observed correlations between them are reported. The concept is shown to con­ sist of a complex combination of phenomena, rather than a simple and directly observable item. Suppose you want to know if a production team is efficient. You have a beginning notion of efficiency. Somebody comes and says, "What do you really mean? Who are more efficient—those who work quickly and make a lot of mistakes, so that you have many rejections, or those who work slowly but make very few rejects?" You might answer, depending on the product, "Come to think of it, I really mean those who work slowly and make few mistakes." But do you want them to work so slowly that there are no rejects in ten years? That would not be good either. I n the end you divide the notion of efficiency into com­ ponents such as speed, good product, careful handling of the machines—and suddenly you have what measurement theory calls a set of dimensions. (Lazars­ feld in Brodbeck, pp. 610-611) N o w let us r e t u r n to one-dimensional variables; further discussion of multidimensional variables is i n Chapter 16. There are several stages i n the process of m o v i n g from a speculative question about the w o r l d to the b e g i n n i n g of the actual empirical work. T h e first necessary step is to transform the original question into the functional form y = f{xi,X2 . . . ) . T h e n y o u must translate the hypothetical (theoreti­ cal) variables (concepts) as they appear i n your functional form into empirical variables ("indicators") that y o u can w o r k w i t h . The philosophical relationship between hypothetical and empirical variables has already been discussed ( C h a p t e r 2 ) . N o w we can get d o w n to the actual process of choosing ("specifying") appropriate empirical variables. The choice of appropriate variables is perhaps the most i m p o r t a n t deci­ sion that the researcher must make. I t is at this stage that one finally pins d o w n the vague interest and turns i t into concrete operational research: the process b y w h i c h one moves from " I w a n t to find out h o w people feel

Choosing Appropriate Proxies for Theoretical Variables

73

about. . . ." to a researchable statement. L i k e every other decision i n the research process, the choice of variables must depend on just w h a t i t is you w a n t to find out and w l i y y o u w a n t to find i t out. Sometimes the general research question immediately suggests the empirical variables. For example, when one asks about the effect of pros­ perity on whether Democrats or Republicans wdn presidential elections, one can easily decide h o w to measure whether a person is a Democrat or a Repubhcan, though the measure of prosperity is not so obvious. I f one asks whether people d r i n k more liquor i n summer or winter, i t is easy to decide that the Federal Alcohol and Tobacco Tax U n i t data are appropriate. But, even i n such simple cases, definition of the variables is not automatic and requires some judgment. Are beer and wine counted as liquor? A n d w h a t about illegal moonshine? I n most research projects, however, one cannot simply plunge i n and start measuring an obvious approximation to the theoretical ( h y p o t h e t i c a l ) vari­ ables of interest, for one of t w o reasons: Direct data on the hypothetical (theoretical) variable may not be available, or the hypothetical variable may be a vague concept like love, happiness, welfare, or conformity, none of w h i c h immediately suggests empirical counterparts. T o p u t i t another way, one cannot find an empirical counterpart about w h i c h almost everyone w i l l agree that i t really "means" the same t h i n g as the hypothetical variable. Yet in b o t h cases some empirical measurement must be done, and i t must be done i n some way that has meaning. I n either case one must create a "proxy" ( o r "surrogate")—a variable that is different from but stands for the phenomenon i n w h i c h one is interested. The trick is to create good proxies, proxies that reveal something about the "real" variable i n w h i c h one is interested. Better than any single proxy are several different proxies. I f you use several proxies, and i f the results agree, you w i l l reduce doubt about whether y o u have captured the theoretical variable i n your empirical vari­ ables. E q u a l l y i m p o r t a n t is that several proxies allow y o u to draw a more general conclusion from your empirical work, and to map out the domain w i t h i n w h i c h your theoretical proportion does and does not h o l d .

1. D e p e n d e n t V a r i a b l e s W h o s e Referents A r e C l e a r l y Defined This section deals w i t h the type of problem Pogo faces forthrightly i n Figure 6.1. W a l t Kelly is kidding—but not entirely. W h e n television was young, an engineer i n the Dayton, Ohio, Water Department discovered that w h e n the commercials came on the water pressure dropped as people went to kitchens and bathrooms. This measurement of program and commercial p o p u l a r i t y is k n o w n i n research folklore as the " D a y t o n Water Survey." Archimedes p r o v i d e d a nice illustration of the substitution of an easily measurable quantity for a quantity that is harder to measure. Archimedes sought to measure the areas of various geometrical shapes. B u t could he do

74

The Process of Social-Science

Research

that empirically? I t w o u l d require c u t t i n g up a shape into t i n y squares and counting the squares, w h i c h w o u l d be tedious, to say n o t h i n g of difficulties w i t h incomplete squares w i t h curved edges. Archimedes then h i t on substi­ t u t i n g a process of w e i g h i n g figures drawn on material of uniform thickness; weight p r o v i d e d a good and handy proxy for area (Mason, p. 5 1 ) . The same o l d dodge is still used for estimating the area under curves i n , say, radioactivity studies i n biology. The next section discusses such imobservable concepts as beliefs, atti­ tudes, and preferences. I n this section we deal w i t h concepts that are mea­ surable in principle b u t not in practice. T h a t is, we are talking here about concepts for w h i c h we have satisfactory operational definitions b u t for w h i c h we cannot carry out the operations. One can develop quite satis­ factory operational definitions of a competitor's advertising expenditures, of China's rice crop, or of a jury's deliberations, even though one is blocked from actually carrying out the operations. O n the other hand, there can be no satisfactory operational definitions of such inner states as beliefs and preferences, except i n terms of the behavioral proxies that one creates for them.^ There are t w o ways to justify the measurement of a given variable as a proxy for another variable that is clearly defined b u t cannot itself be mea­ sured. The first is to demonstrate an empirical association between the proxy and the hard-to-measure variable. The second is to demonstrate a logical link w i t h a chain of reasoning between the proxy and the conceptual vari­ able. A combination of the t w o methods is best. Each of the t w o ways w i l l be illustrated here. M a n y examples of pure empirical proxies can be f o u n d i n aptitude and vocational testing. One testing organization asks the subject to w r i t e as many words as he can, any words at all, i n a one-minute period. A person w h o gets a h i g h score is said to have an aptitude for those jobs that require high creativity. This prediction is based on the supposed fact that people w h o are successful i n creative vocations score higher on this test than do other people. One m i g h t be able to find some logical rationale for such a relationship, b u t basically i t is a purely empirical link; i f people w h o are successful i n creative jobs were f o u n d to score l o w on this test, a l o w score w o u l d then be accepted as a predictor of success i n creative jobs. M a n y of the projective tests used bv clinical psychologists—like the Rorschach ink-blot test and the Thematic Apperception Test, as w e l l as such personality batteries as the Minnesota Multiphasic Personality I n v e n t o r y are "validated" i n a purely empirical wa\'. Those responses or "profiles" that are f o u n d among people k n o w n to be manic-depressive rather than schizophrenic, say, or neurotic rather than psychotic, are then used as "indicators" to predict whether a given person is of one type or another. 1. One might argue that there is no difference between hypothetical (theoretical) con­ cepts for which we have no operational definitions, except in terms of proxies, and con­ cepts for which we have operational definitions that cannot be carried out. If \'ou do not find the distinction useful, ignore it.

Choosing Appropriate FIGURE

Proxies for Theoretical

6.1

Source: Pogo, November 26, 1963, by Walt Kelly; © 1963 Walt Kelly.

Variables

76

The Process of Social-Science

Research

I n most of the social sciences, the relationship between a proxy and a defined b u t hard-to-measure conceptual variable is t h r o u g h a chain of rea­ soning. For example, C. W a r b u r t o n made a fascinating determination of h o w m u c h liquor was drunk i n the U n i t e d States d u r i n g Prohibition. The usual measurements of liquor consumption—tax receipts—were not avail­ able, so he made the f o l l o w i n g estimates: 1. amounts used of various possible ingredients of liquor ( c o r n sugar; corn syrup and corn starch, corn meal; corn, rye, and other grains; malt syrup; fruits and vegetables), plus estimated amounts of industrial alcohol, medicinal alcohol, and smuggled alcohol 2. death rates from alcoholic diseases 3. number of arrests for drunkenness. H e then combined these three types of estimate i n one grand estimate. The relationship between each of the three estimates and actual consumption could not be directly vaHdated empirically, except perhaps b y a survey, and i t was too late for that. W a r b u r t o n s only validation of each of these esti­ mates, and therefore of his combined estimate, was his reasoning about the l i n k between d r i n k i n g and the sources of production, death rates, and arrest rates. I t is sometimes possible to study an invisible phenomenon by finding something visible that is logically l i n k e d to i t . A famous example i n the physical sciences is observation of the B r o w n i a n movement of invisible molecules by the use of the "cloud chamber." I n this device the molecules become attached to droplets of o i l that are of visible size, and the m o t i o n of the molecules can then be inferred from the observed motion of the o i l droplets. Business researchers have used many ingenious methods—some ethical, some not so ethical—to obtain sales information that their competitors do not disclose. Retailers w h o w a n t to k n o w a competitors sales volume can count the number of customers w h o enter the competitor's store and then m u l t i p l y b y the estimated size and number of purchases. (Sometimes a retailer takes the more direct method of having agents w a t c h the cash registers i n the competitor's store for five minutes at a t i m e and then pro­ jecting overall estimates from that sample evidence.) Measuring competitive sales b y consumer-purchase panels and b y store inventories is a large and flourishing business for many commercial firms. Information about a competitor's advertising budget is normally obtained by keeping a count of the number and sizes of the competitor's advertise­ ments i n all advertising media that the competitor is k n o w n to use. Com­ mercial firms and trade associations perform this service for many types of firms i n many industries. But the counts are never quite complete. Each firm therefore adjusts its estimate of a competitor's budget b y comparing the amount the outside agency estimates for the firm's o w n expenditures against

Choosing Appropriate Proxies for Theoretical Variables

77

the firm's o w n actual expenditures, w h i c h m i g h t be 30 percent to 90 percent greater, and i t then makes the appropriate adjustment i n the reported figures for its competitor. MiHtary intelHgence officers commonly estimate unavailable magnitudes from w h a t can be observed. Content analysis is frequently used for this purpose. For example, I am t o l d that i n W o r l d W a r I I the Allies i n E n g l a n d counted and measured the numbers of various types of songs played on various European radio stations as an index of changes i n German troop concentrations, and deduced from the serial numbers of captured weapons the extent of German w a r production. Instruments can sometimes be used to see the invisible. Organisms that are too small to be seen w i t h the naked eye reveal themselves to the microscopist. A n d geographical features of the moon came close w i t h the aid of the telescope and r a d i o t é l e s c o p e . But a contrary example also springs to m i n d . W h a t is the best proxy for whether i t is raining outside? O n first thought the best proxy is a rain-collecting device that can be read electroni­ cally inside the b u i l d i n g . But, i f you w a n t to go outside i n a moment, a better proxy m i g h t be to observe h o w many of the people w h o w a l k past the w i n d o w w i t h umbrellas have them open and how^ many of the umbrellas are closed. W h a t other people are doing about the rain ( assuming that they are acting rationally, just as economics assumes the rational economic man ) is a good proxy for the variable ( r a i n ) that y o u w a n t to k n o w about. A n d an observation of umbrellas may be a better proxy than the automatic rain collector. You w o u l d be better prepared to k n o w h o w to dress i f you knew that 90 percent of the people h a d their bumbershoots up than i f y o u k n e w that p r e c i p i t a t i o n was at the rate of .005 something per something per minute. O n the other hand, i f you were estimating rainfall for the U.S. Weather Service, an umbrella count w o u l d not be satisfactory. Here is another ex­ ample of h o w purpose and cost/benefit analysis of the information must dictate the choice of method. T o learn about unobservable h u m a n behavior, one must usually depend upon the subject's statements about his behavior as proxies for the behavior itself, as A . Kinsey d i d . B u t we must always w o r r y about h o w good a proxy "verbal behavior" is for actual behavior. I t was not practical for Kinsey to validate the proxy variable directly b y observing subjects and then com­ p a r i n g the observed behavior w i t h their statements. Therefore, he used reliability checks to determine whether several types of statements b y the subject j i b e d w i t h each other, whether husband's and wife's statements agreed, and whether statements made i n later reinterviews j i b e d w i t h earlier statements. Kinsey also used the skilled i n t u i t i o n of the interviewer as a measure of validity. Laboratory experimentation sometimes provides reasonable proxies w h e n subject matter is not accessible. Researchers at the University of Chicago

78

The Process of Social-Science

Research

who wished to study the workings of the jury system were denied access to actual j u r y deliberations for ethical-legal reasons. (After they had obtained permission from one judge to monitor a j u r y room, a member of Congress became so indignant that he called for a f u l l Congressional investigation even t h o u g h the cases were m i n o r and the judge and lawyers all h a d given their permission. ) T o overcome this obstacle experiments were set up w i t h the cooperation of courts i n w h i c h people currently on the jury list listened to recorded mock trials. The p r o b l e m of estimating the characteristics of a group of people that does not yet exist illustrates the difference between hypothetical variables that are conceptually vague—such as attitudes and beliefs—and those that in principle are not vague. H o w can a university administration estimate w h a t k i n d of housing the students twenty-five years hence w i l l w a n t w h e n those students have not even been b o r n yet? There is no difficulty i n con­ structing a perfectly satisfactory operational definition of the body of stu­ dents twenty-five years from now, b u t the operation obviously cannot be carried out now. ( A sensible approximation may be to assume that presentday students are like future students. ) Whenever possible, y o u should t r y to vaHdate your empirical variable b y comparing i t directly w i t h the variable that is of ultimate interest to you. For example, I once developed a m e t h o d to help marines improve their performance w i t h the .45 caliber pistol. The teaching device was a series of holes i n a sheet of metal into w h i c h the shooter placed a r o d i n the barrel of his pistol. T h e shooter's task was to p u l l the trigger w h e n the gun was aimed at the target, at w h i c h time the r o d was w i t h i n the hole. I f he jerked the gun off the target, an electric circuit w o u l d r i n g a buzzer. This method d i d improve the marines' skills i n dry-firing i n the laboratory, and i t seemed reasonable that skill on the live-ammunition range w o u l d also be i m p r o v e d . But w o u l d improvement actually show up on the firing range? T h a t w o u l d be the ultimate validation. Unfortunately I was transferred before I could compare the empirical dependent variable ( performance i n the laboratory ) w i t h the variable of ultimate interest ( performance on the target range ) . Generally, the greater the similarity between the proxy and the actual variable, the better the proxy. For example, i f y o u w a n t to find out whether a small c h i l d thinks that a dollar b i l l is w o r t h more than 100 pennies, y o u w i l l p r o b a b l y do better to ask " W h i c h one do y o u w a n t for your birthday?" than to ask " W h i c h one is bigger?" or " W h i c h one is w o r t h more?" The Schwerin firm exploits this p r i n c i p l e i n measuring the effect of advertising. Before and after showing commercials to special movie audiences, Schwerin holds a lottery and offers people the choice of various brands of mer­ chandise i f they w i n . Choice w h e n one may actually receive the merchan­ dise is a more realistic measure of preference than asking " W h i c h brand do you prefer?" T h e extent to w h i c h attitudes are reasonable proxies for behavior must also be considered carefully. For example, i t is common to reason that a

Choosing Appropriate Proxies for Theoretical Variables

79

w o m a n w h o says she hates blacks is more likely to discriminate than is a w o m a n w h o says she likes blacks. M a y b e so, b u t maybe not. B u t i t is surely foolish to pay no attention to attitudes and to argue that attitudes are always a worthless measure. For example, i t was found i n W o r l d W a r I I that, among soldiers w h o had never been i n battle, those w h o said they were confident of their skill in battle, less fearful of getting hurt, and aggressive t o w a r d the enemy d i d indeed perform better i n battle later on ( K e n d a l l & Lazarsfeld i n M e r t o n & Lazarsfeld, p. 179). A n d , i n countries where people say they w a n t relatively few children, they actually have relatively few children (Berelson, 1966). Just how good a proxy an attitude is depends on the situation. Advertisers have had to be sophisticated about this point. For example, consider just a few of the possible measurements that relate to the selling power of a magazine advertisement: the number of copies of the magazine sold, the number of people w h o look at the magazine, the number w h o look at a particular page, the number w h o notice the advertisement, the number w h o read part of the advertisement, the number w h o remember the advertise­ ment, the number w h o exhibit more favorable attitudes after seeing the advertisement, the number of people w h o remember selling points about the product, the increase i n the number of people w h o say they want the p r o d ­ uct, the number of people w h o t r y out the product, and, finally, the number of people w h o actually become purchasers.- The closer the measurement is to actual purchasing, the better the proxy should be. O n the other hand, closer proxies—including actual sales—are often m u c h harder to measure, and therefore a compromise is usually struck. Under the best of conditions the proxy is also vaHdated empirically. I n some situations one selects an emipirical variable that is related to the variable of final interest only b y a long chain of reasoning. As an illustration, M . H a i r e w a n t e d to find out w h y w o m e n d i d not b u y instant coffee. T h a t question is reasonably specific, b u t i t is still a long way from being specific enough to define a piece of research. W h a t H a i r e h a d to do next was to specify a set of dependent and independent variables and a functional relationship that he could study empirically. W h a t he actually d i d was to conduct an experiment i n w h i c h he sub­ m i t t e d t w o shopping lists to t w o groups of women. O n the shopping list he gave to group A were many items of food plus "Nescafe instant coffee." O n the shopping list for group B were all the same items except that " M a x w e l l House coffee ( D r i p G r o u n d ) " replaced "Nescafe instant coffee." Haire then asked the w o m e n i n his groups to say w h a t k i n d of w o m a n they thought h a d made out the shopping list. T w e n t y - f o u r of the twenty-five w o m e n i n group A, whose fist i n c l u d e d instant coffee, gave such judgments as "lazy house­ keeper." O n l y t w o of fifty i n group B, whose list included regular coffee, said the w o m a n was a lazy housekeeper. 2. See W. Madow, et al. (pp. 41ff.), for a similar discussion of types of televisionaudience measurement.

80

The Process of Social-Science

Research

The functional form of Haire's research design was "Judgment about shopping-list maker is a function of whether instant or regular coffee is on the shopping list." B u t his original purpose was to find out w h y w o m e n do not themselves buy instant coffee. I t seems reasonable, then, to ask w h a t the connection is between Haire's original question and the functional form he actually used. The connection between the basic question of the research and the actual variables and functional form is a chain of reasoning. Haire assumed that w o m e n could not or w o u l d not give the real reasons that they themselves d i d not buy instant coftee. H e further assumed that they m i g h t "project" their real reasons onto other people. Therefore, the chain of reasoning was as follows: First, i f a w o m a n thinks that she herself is lazy w h e n she buys instant coffee, she w i l l say that another w o m a n w h o buys instant coffee is lazy; second, i f a w o m a n thinks herself lazy w h e n she buys instant coffee, she w i l l not b u y instant coffee. Haire therefore assumed that i f w o m e n d i d judge instant-coffee buyers lazy, he w o u l d have evidence that belief i n the laziness of instant-coffee buyers really does i n h i b i t purchase. A n d that is indeed w h a t he found. ( H a i r e then buttressed these findings b y a supple­ mentary investigation of w h i c h w o m e n d i d b u y instant coffee, and, indeed, the w o m e n w h o d i d not say that other instant-coffee users were lazy tended to b u y instant coffee themselves.) A variable that is a good proxy at one time may not be a good proxy years later. Studies that repeated Haire's 1940s experiment i n the 1960s and 1970s d i d not obtain the results he d i d , probably because of a shift i n attitudes t o w a r d instant coffee ( A r n d t ) .

2. D e p e n d e n t V a r i a b l e s W h o s e Referents A r e Not C l e a r l y Defined W e have just discussed empirical variables as proxies for concepts that are clearly defined b u t impractical to measure. N o w we shall discuss proxies for w h i c h an empirical vaHdation is not only practically impossible b u t also conceptually impossible. N o r are logical validations possible for these proxies. They differ from Warburton's example, for in principle he m i g h t have validated his estimates b y t a k i n g surveys of h o w m u c h people drank d u r i n g Prohibition. The intelligence test is an interesting example because i t illustrates b o t h types of uses of proxies. I f the I . Q . test is interpreted as a prediction of future school success, i t illustrates the first type, because i t can be validated by comparing I . Q . scores w i t h later school grades; a h i g h correlation con­ vinces us that the I . Q . test is a good proxy. But, i f we interpret the I . Q . test as an empirical proxy for a generalized hypothetical notion of intelligence, w e are deaHng w i t h the second type, for no empirical or logical v a l i d a t i o n is possible. H . Ebbinghaus early gave m u c h thought to the relationship between the inner mental state of memory, w h i c h is unobservable, and the behavior of

Choosing Appropriate Proxies for Theoretical Variables

81

repeating nonsense syllables correctly, w h i c h Ebbinghaus observed and measured. I t is the experimental use of this observable behavior that gives . . . a foothold for the application of the method of the natural sciences: namely, phenomena . . . which are clearly ascertainable, which vaiy in ac­ cordance w i t h the variation of conditions, and which are capable of numerical determination. Whether we possess in them correct measures for these inner differences [memory], and whether we can achieve through them correct con­ ceptions as to the causal relations into which this hidden mental life enters— these questions cannot be answered a priori. . . . There is only one way to do this [validate the proxy], and that is to see whether it is possible to obtain, on the presupposition of the correctness of such an hypothesis, well classified, uncontradictory results, and correct anticipations of the future. (Ebbing­ haus, p. 9) There is also a close relationship between choosing proxies for vague variables and choosing measurements, as this story illustrates: GROOMING CODE:

SISTER TURNS

"FINGER-MAN"

Ferndale, Mich., Feb. 20 (AP)—Too bushy or not too bushy? That is the ques­ tion at St. James Catholic High School in this Detroit suburb. Sister Mary Aurilla, school principal, has devised a simple test to answer the question of whether the hair styles worn by the 200 girl students are too bushy. She pushes her index finger into the girl's hair. I f her finger tip touches the girl's head and the knuckle doesn't show above the hairdo, then it's too bushy. Girls who fail the finger test are sent home to rearrange the situation. (ChampaignUrbana Courier, February 21, 1965) I f business of hair is understood to refer only to the single dimension of thickness, then we can view the finger test as a measurement. But, i f bushiness could mean other characteristics i n a d d i t i o n to thickness, then the finger test is a one-dimensional proxy chosen to stand for the more complex hypothetical (conceptual) variable. W h e n your hypothetical variable is vague, y o u must b u l l ahead and say "This empirical variable is w h a t I shall call. . . ." W h e n Kinsey w a n t e d an empirical variable to stand for male sexual outlet, he simply said that orgasm w o u l d be the proxy, because i t seemed reasonable to h i m and be­ cause no other measurable variable seemed more reasonable. I f orgasm took place, the act was counted as sexual outlet; i f not, not. Kinsey was not asserting that orgasm is sexual outlet; most of us w o u l d probably agree that i n the loose sense there can be sexual outlet w i t h o u t orgasm. Orgasm was simply the best that Kinsey decided he could do for an empirical proxy i n the male study. ( I n the female study, however, orgasm was not the proxy for sexual outlet. ) One can validate this type of variable only w i t h judgment. The variable has been w e l l chosen i f other scientists are w i l l i n g to accept that the empiri-

82

The Process of Social-Science

Research

cal variable gets at the hypothetical variable. N o t all empirical variables are w e l l enough chosen to overcome this obstacle, and indeed some proxies are d o w n r i g h t foolish. To measure mother love b y h o w often a mother cuddles a c h i l d m i g h t or m i g h t not be satisfactory. To measure mother love b y h o w much the mother says she loves a child w o u l d probably be w o r t h little or nothing. One must not forget the distinction between the proxy and the "real" variable, as noted for orgasm and sexual outlet. For another example, run­ n i n g away from home m i g h t be a good indicator of unhappiness i n adoles­ cents. B u t one should not therefore t h i n k that anyone w h o runs away from home is unhappy. T o show such a relationship, a researcher w o u l d first have to show an empirical link between r u n n i n g away and other indicators that are generally accepted signs of unhappiness. Can one ever observe the "real" variable directly? Consider again being inside a b u i l d i n g and w a n t i n g to k n o w whether i t is raining outside so that you can decide w h a t y o u should wear. You are likely to look out the w i n d o w to see whether y o u can see raindrops falling. A r e y o u then observing the "real" variable, or are you observing only a proxy variable? C o m m o n sense says that y o u are observing the real variable, as i t is the rain itself that you are looking at. B u t w h a t you observe w i l l depend upon the time of the day and other conditions that affect your vision, so that w h a t y o u observe is not fust the real variable. Perhaps w h a t y o u observe can better be described as the visual impression of the raindrops seen from a distance under certain visual conditions. D e c i d i n g whether y o u are observing the "real" variable is not likely to be a problem. You w i l l be on safe ground i f y o u describe your observed vari­ able as a close or not-so-close proxy for the phenomenon i n w h i c h y o u are interested. I f y o u t h i n k that w h a t y o u are observing is the "real" variable, you m i g h t state that there is complete identity between the proxy and the phenomenon y o u are studying. Sometimes several research techniques can be strung together to form a logical chain between a proxy and an ill-defined variable. I n the library study, for example, we wished to determine the relationship between the scholarly value of given books and the rates of their w i t h d r a w a l from the l i b r a r y (Fussier & S i m o n ) . Therefore, we first investigated the w i t h d r a w a l of books from the library b y studying w i t h d r a w a l records. T h e n we investi­ gated the relationship between book use w i t h i n the library and w i t h d r a w a l s from the library b y a questionnaire survey of the unrecorded use of books. T h e n we related recorded book use to the ultimate value of books, w i t h the aid of judgments b y a panel of expert scholars i n various fields. Given that validating chain of reasoning, we could reasonably accept the rate of w i t h ­ d r a w a l of a book as a proxy of its value i n the succeeding work. Students of h u m a n emotions often validate b y comparing several different proxies. I f there is h i g h agreement among them, faith i n one or all of them is increased. One may also decide that the proxy that best agrees w i t h all the

Choosing Appropriate Proxies for Theoretical Variables

83

others is the best proxy among them. For example, i f I w a n t to measure h o w happy people are, I m i g h t ask them these questions, each of w h i c h , I think, tells something about h o w happy a person is: 1. I n general, h o w w o u l d you say you feel most of the time, i n good spirits or i n l o w spirits? ( Stouffer et al. ) . A. I am ( was ) usually i n good spirits B. I am ( w a s ) i n good spirits some of the time and i n l o w spirits some of the time C. I am ( was ) usually i n l o w spirits 2. T a k i n g all things together, h o w w o u l d you say things are these d a y s w o u l d you say that y o u are ( were ) very happy, pretty happy, or not too happy? ( B r a d b u r n k Caplovitz ) . A. Very happy B. Pretty happy C. Not too happy 3. H o w good a life w o u l d you say that you yourself have? We're t a l k i n g about just you, and not your family or your community. A. W o n d e r f u l life B. Good life C. So-so life D . Poor life E. Terrible life I f the answers that people give to the three questions are h i g h l y correlated, my confidence is increased that singly or together these questions are indeed measuring happiness. (Chapter 17 discusses the combination of separate proxies i n composite indexes. ) The branch of economics called "welfare economics" offers a particularly interesting and thorny problem of relating the proxy to the theoretical vari­ able. Most economists agree that the aim of welfare economics is to study the effects of various arrangements on the "welfare" of groups of people, and "welfare" is generally taken to be a synonym for "happiness" ( Boulding, 1952 ). B u t the amount of goods and services ( purchasing power ) is always the proxy used i n welfare economics; studies i n welfare economics com­ monly consider w h a t happens to the goods and income of a community under various conditions. Anyone w i t h a grain of w i s d o m knows that money is not all there is to happiness, however. A n d quite obviously a particular good does not have the same happiness value for all people. Here is the perennial stumbling block of "interpersonal comparisons": H o w can we tell the amount of happiness that a loaf of bread w i l l give a poor person com­ pared to the happiness i t w i l l give to a rich person? Some economists have given up on the idea of validating welfare eco­ nomics by empirical or logical methods. They feel that the reader must intuitively accept the "objective function" relationship between goods and

84

The Process of Social-Science

Research

happiness i f he is to accept welfare economics. Others simply deny that they are t a l k i n g about happiness and restrict their discussion to material wel­ fare. B u t then welfare economics loses its claim to speak of the ultimate h u m a n concern, and i t can become arid and w i t h o u t content. T h e situation may not be as bad as i t seems; some empirical validation may be possible. "Happiness" has been investigated successfully under vari­ ous other sets of conditions, and there is no a priori reason w h y i t is not possible to relate happiness to various economic conditions also. I n fact, happiness seems to be positivelv related to people's incomes (see J. Simon, 1974).3 E m p i r i c a l investigation of the quahties of art is not dissimilar to welfare economics i n this respect. I n b o t h cases the dependent variable is ephemeral and subjective. Anyone can simply state that a particular novel or p a i n t i n g is "better" than another novel or p a i n t i n g w i t h o u t needing any definition or proxy for "better." B u t to make a scientific study of w h a t makes one novel or play better than another, an objective empirical variable is necessary. I t is not difficult to find some objective empirical variables; for example, the number of copies the novel sells, and the price p a i d for the p a i n t i n g , may appeal to many businessmen as excellent proxies for excellence of art. Or one m i g h t compare h o w m u c h t w o novels or t w o plavs move people b y measuring various physiological changes associated w i t h emotion—like respiration or sweating—with devices similar to the lie detector. B u t I doubt that i n the near or distant future any objective proxy wdll satisfy critics and artists as a "true" or "self-evident" measure of the value of art. Instead they are w o n t to say that to measure art is to measure the unmeasurable. ( A content analysis [see Chapter 14] m i g h t , however, reveal that "good" adven­ ture stories use higher proportions of verbs than do less exciting adventure stories. )

3. C h o o s i n g I n d e p e n d e n t V a r i a b l e s Most of the proxies discussed i n the previous section have stood for depen­ dent variables. B u t difficulties also exist i n choosing appropriate indepen­ dent variables. For example, i f you w a n t to investigate the effect of t r a i n i n g u p o n the performance of a rat r u n n i n g t h r o u g h a maze, you must first specify exactly w h a t you mean by "training," and then y o u must t r a i n the rats i n exactly that way. This is a good place to say again that the questions i n a questionnaire survey can have a major influence on w h a t answers mean. T o exaggerate the point, i f y o u were t r y i n g to measure the dependent variable "degree of l i k i n g of Jimmy Carter," i t w o u l d make quite a difference whether an inter3. Earlier it was noted that economics has relatively little difficulty in validating its variables, however, precisely because they can usually be expressed in single-dimension variables like dollars or tons of steel.

Choosing Appropriate Proxies for Theoretical Variables

85

viewer says " D o y o u like that great American J i m m y Carter?" or " D o y o u like that traitor J i m m y Carter?" I n contrast to specifying dependent variables, choosing independent vari­ ables often permits performance of an objective test to see w h i c h variable is better. The researcher can often t r y out t w o or more independent variables to see w h i c h one works best i n "explaining" the dependent variable or w h i c h one is more closely correlated w i t h the dependent variable. For example, an economist w h o is studying the demand for autos and other goods may have a choice between using family income or family expenditures i n a given year as a variable; the t w o may differ because families may save some income or spend some savings. The economist may have theoretical reasons for pre­ ferring expenditures, say, b u t he probably w i l l t r y out both variables to see w h i c h gives the best results. Another p r o b l e m is hoio many independent variables to use at one time; this p r o b l e m mostly afflicts the economists and sociologists i n their nonexperimental research. W e do not really k n o w h o w to answer this question; at best the answer is complex and not w e l l understood. I n crude terms, one keeps on a d d i n g more variables to a cross-classification or other multivariate scheme as long as they improve the explanation appreciably. ( O n the other hand, one must avoid adding variables that are not relevant, because adding t h e m introduces error; this is another difficult and little-understood problem.) T h e choice of variables is a cousin to classification; i n the latter y o u specify the dimensions or attitudes ( a n d the categories of those dimensions) on w h i c h y o u w i l l measure the events or species or groups of people i n w h i c h you are interested. Classification is discussed i n Chapter 15, as are the related matters of scaling and measurement. M a k i n g sure that the supposed variables are not tautologies—and there­ fore not usable i n empirical research—is a last obstacle i n choosing vari­ ables. For example, S. Freud's pleasure-pain principle is a tautology as i t is usually stated. I f one says that a person does w h a t is pleasurable to h i m and that he does not do w h a t is painful, then everything he does is pleasurable by definition. A n d , i f so, there is no w a y of empirically distinguishing plea­ surable from painful actions or of using pleasure-pain as an empirical vari­ able. C. Darwin's evolution principle also is often stated as a tautology.

4. C h o o s i n g a L e v e l of Aggregation T h e w o r l d is like a cobweb, I suggested before. Because every filament is ultimately related to every other filament, a movement of any filament w i l l finally be reflected b y every other filament. Nevertheless, some filaments are closer to or farther from others, and there w i l l be stronger or weaker associa­ tion between them. Also, a movement i n a given filament y m i g h t be caused by the movement i n filament x, where the movement is initiated; or the

86

The Process of Social-Science

Research

movements i n b o t h x and y m i g h t be reflecting movement initiated at z (the " t h i r d factor") ^ N o w we must consider a more complex v i e w of the w o r l d . Consider the question of w h y people buy automobiles. Just a few strands of this question are shown i n Figure 6.2. The four different purchase rectangles represent four different "levels of aggregation." A given scientist might—and some do—try to explain any one of the f o l l o w i n g : 1. total purchases ( o r changes i n total purchases) 2. purchases b y particular groups, like northerners or blacks or Catholics ( o r comparisons between groups, northerners versus southerners, for example) 3. purchases b y households, expressed as probabilities 4. purchases b y individuals. Economists are mostly interested i n total purchases; sometimes they t h i n k about the group or i n d i v i d u a l level of aggregation b u t p r i m a r i l y as a means for understanding and p r e d i c t i n g total purchases. Anthropologists are also mostly interested i n total purchases. Sociologists are more likely to be i n ­ terested i n the group level of aggregation. Psychologists are mostly i n ­ terested i n i n d i v i d u a l purchases. M a r k e t researchers are interested i n i n d i ­ v i d u a l purchases w h e n they study advertising appeals related to auto styles, for example; b u t they w o r k at the group level of aggregation w h e n they study where to locate dealers and where to advertise and at the level of total demand for autos when they advise on how many cars to produce. Most of the market researcher's attention is actually devoted to a level of aggrega­ t i o n not even shown i n the diagram—individual purchase of particular brands of cars. These points should be considered w h e n deciding at w h i c h level of ag­ gregation to w o r k : First, w h a t level of aggregation are y o u interested in? T h a t is, for w h i c h level of aggregation do y o u w a n t to make decisions, for example the i n d i v i d u a l or the nation? Or w h a t body of data do y o u w a n t to explain? I f y o u w a n t to study the causal relationship r u n n i n g from per capita income to literacy or the correlation between per capita income and suicide rates, y o u w a n t data for individuals and not groups. B u t y o u m i g h t w a n t to study the relationship between 7iational income and literacy i f y o u are interested i n the extent to w h i c h a high national income produces literacy. The level of aggregation i n w h i c h y o u are interested is often influ­ enced b y the discipline w i t h i n w h i c h y o u work, b u t y o u should not blindly adopt the level of aggregation usually used i n your discipline. 4. The scientist's problem is that the connections among filaments do not always seem obvious or sensible. "During World War I I , the production of optical instruments was temporarily greatly hampered by a shortage of babies' diapers. The reason was that diapers were an excellent polishing material for lenses" (Morgenstern, p. 133).

Choosing Appropriate Proxies for Theoretical Variables FIGURE

6.2

87

Some Possible Variables and Levels of Aggregation for Use in Analysis of Auto Purchasing

Please Note: The assignment of concepts in this diagram to particular fields may be misleading, especially in these years of the late 1970s when the social sciences seem to be breaking down the partitions among them.

88

The Process of Social-Science

Research

Second, w h i c h level of aggregation is i t feasible to study? Sometimes data are not available at the desired level of aggregation. For example, E. D u r k h e i m studied the relationship between religion and suicide. The reli­ gions of people w h o c o m m i t suicide are the relevant data, b u t all that was available were the numbers of people who c o m m i t t e d suicide and the numbers w h o followed the various religions i n countries and states as wholes. D u r k h e i m d i d as w e l l as he could at this too-high level of aggrega­ tion, showing that suicide is more common i n places dominated b y particu­ lar religions. B u t this tactic also occasionally led h i m into the ecological fallacy (discussed i n Chapter 2 1 ) . Sometimes one must go to extra effort to study the subgroups separately so as not to be misled by the aggregate. This is especially true w h e n com­ p a r i n g t w o or more aggregates that are composed of different proportions of subgroups. For example, the crude death rate among U.S. blacks ( t h a t is, the p r o p o r t i o n of deaths i n a year to total population of U.S. blacks) is lower than the crude death rate for U.S. whites, b u t the life expectancy at b i r t h for whites is higher than for blacks, as of 1976 (Population Index). The explanation is that a larger p r o p o r t i o n of whites are i n the older age brackets, due to lower m o r t a l i t y and lower fertihty i n the past. I n such a situation one must disaggregate and compare the mortaUty rates for each age group i n order to compare blacks and whites fairly. T h i r d , sometimes i t is easier to understand the whole than to understand all its parts and the relationships among them. For example, i t w i l l be much easier and more accurate to determine the effect of a price rise upon auto sales b y studying the relationship of total auto sales i n past years w h e n different prices prevailed than to t r y to understand the motivations and psychological reactions to a price rise of individuals w h o m i g h t or m i g h t not purchase autos. I t is particularly sensible to w o r k at the higher level of aggregation ( t o t a l purchases, rather than individuals, i n this case) w h e n there are many different motivations and psychological influences that bear i m p o r t a n t l y u p o n the individual's response ( i n this case, response to the price rise). Sometimes, however, the whole does not equal the sum of the parts, i n w h i c h case one cannot reason from the parts to the whole, or vice versa. One reason that the whole may behave differently from the parts is that there may be interaction among the parts. A n example is R. Niebuhr's argument about "moral man and i m m o r a l society"; one cannot ascertain the likelihood of a country's starting a war from data on the amount of conflict and hostility among neighbors w i t h i n the country. J. Keynes' t h r i f t paradox is another case; societies whose citizens save a l o t may thereby g r o w poor instead of rich.^ 5. Another technical reason that the whole may not equal the sum of its parts is known as the "aggression problem" in economics. The demand functions of indivickial consumers, say, do not sum in any simple way to the demand function for the economy as a whole. The problem is particularly acute in economics because the basic deductive theory con­ cerns individuals, but the data and policy decisions concern higher levels of aggregation like the nation.

Choosing Appropriate Proxies for Theoretical Variables

89

Experience and careful thought, then, are the only aids to m a k i n g correct decisions about levels of aggregation. Often the wisdom of your field w i l l be a good guide b u t not always; for example, an economist can sometimes do better i f he abandons the usual economist's level of aggregation—total de­ mand—and works at the level of individuals ( K a t o n a ) or subgroups ( M o r g a n i n Klein et al). Similarly, a psychologist can often best answer a psychological question by w o r k i n g w i t h groups of people rather than w i t h individuals. Some people have thought that the lowest level of aggregation must always be best. This v i e w is called "reductionism." The argument against reductionism is mostly not a matter of logic, except to the degree that, because there is always some possible lower level of aggregation, no level really meets the reductionist requirement. I n this view, psychology is more "basic" than sociology, biology is more basic than psychology, chemistry is more fundamental than biology, and so on. I t is a simple empirical fact of the history of science, however, that i n many cases i t has proved more fruitful to work at higher levels of aggregation. 5. C h o o s i n g a L e v e l of E x p l a n a t i o n There are many, many variables that one m i g h t say are causes of auto purchases. Just a few of these variables are shown on the left side of Figure 6.2. The question is, w h i c h of these variables should one choose w h e n investigating the causes of auto purchasing? D e c i d i n g among possible alter­ natives is the central task i n specifying variables. This section deals not w i t h h o w to think up variables, b u t rather w i t h h o w to decide among variables, that is, how to choose w h i c h variables to include and exclude among those you do t h i n k up. I assume that y o u have already completed the t h i n k i n g - u p stage, w h i c h y o u should summarize this w a y : auto purchasing = / (the invention of autos, the auto culture per capita, income, income of individual, age of present car, price of autos, style changes of cars, desire for sex mastery, desire to consume conspicuously, desire for high status, stated intentions to buy a car, condition of roads . . . ) . This equation is simply an all-inclusive listing of factors that y o u t h i n k m i g h t influence purchasing. W r i t i n g d o w n the functional form is a crucial step i n jogging your imagination and clearing your m i n d . You may find i t easier to w r i t e symbols instead of words (for example, S = auto sales), b u t I find it safer to stick to words at the very beginning. W h e n y o u decide at w h i c h level of aggregation you w i l l work, some vari­ ables w i l l be excluded. For example, the sex drives of the country as a whole do not vary m u c h from year to year, though individuals may change and there may be differences among individuals. Therefore, sex drives are not a useful variable i n explaining the variation i n auto purchases from year to year. O n the other hand, i f y o u are w o r k i n g at the level of i n d i v i d u a l pur­ chases, the existence of our automobile culture w i l l not help m u c h as a vari-

90

The Process of Social-Science

Research

able, for all i n d i v i d u a l subjects are part of the culture. The same is true of price and the i n d i v i d u a l . But there is not an automatic relationship between the level of aggregation on the right side and the level of abstraction on the left side. For example, intention to buy is an individual-level variable, b u t i t can be used to predict total purchases. You must think your way through this maze, variable b y variable. Your discipline may supply some guidance on w h i c h variables to include on the causal side. For example, social status is a "sociological variable." B u t on occasion economists and psychologists have also found i t useful. FIGURE

6.3

The Story of an Accident

I n the community nobody thought much about traffic accidents. There were traffic laws but governing bodies had never put enough money into Iniikling roads, training drivers, and supervising traffic.

Public Apathy

Joe was tauglit by his fatlier. There was no driver training i n the schools he attended.

Xo Organized Driver T r a i n i n g

His father stressed the mechanics of the car so that Joe would be able to take care of it himself. As soon as he learned to manipu­ late the controls, Joe drove by himself.

Failure to Stress Safety in Teaching

Funds X'ot Made Available

Government

f- Educators

Some years later, Joe had a car of his own. Every night he drove to the next town to see his girl. One night he was late and had to h u r r y . The highway had been built many years be­ fore when cars could not go so fast as Joe's. The curve was too flat and sharp. On the way, Joe passed a road sign which had once read "Slow—Dangerous Curve" but now it was too faded and dirty to be legible. Cars of the model of Joe's had cheap window crank knobs which came off w i t h use and left a sharp crank end exposed.

J)esign

Engineers Maintenance

Design Vehi(

Joe's crank knob had come off. l i e had meant to have it fixed but had never got around to doing i t . There were too few police to make speed vio­ lations much to worry about. Moreover Joe had never been ticketed even when he knew officers had seen him violate. He thought you could talk yourself out of i t . Besides, the judges usually let you off easy.

Maintenance

Inadequate Supervision Inactive Personnel

Police

Enforceuient Agencies

Figure 14.4. Source: "Framework for Assessment of Cause of Automobile Accidents," pp. 442-3, by J. Stannard Baker, in The Language of Social Research, edited by P. Lazarsfeld and M . Rosen­ berg; copyright © 1955 by The Free Press of Glencoe, a division of The Macmillan Company.

91

Choosing Appropriate Proxies for Theoretical Variables

There are also causal relationships among some of the variables on the causal side. I f a relationship between t w o independent variables is very strong, i t is not sensible to use both variables because their effects w i l l overlap strongly. The key point is that no one level of explanation is inherently superior to other levels of explanation. Consider also the example of mental illness and the many levels at w h i c h i t is fruitful to investigate i t . Biologists and physi­ cians have studied the physiology and pharmacology of mental illness, and they have discovered valuable physical and chemical treatments. PsycholoHow and Why I t Happened

One day Joe was late and wanted to make up time.

Hurry

That

Speed

night

Joe

was

going

faster

than

usual.

There was a slight mist falling.

Weather

He did not slow for the curve.

Neglect

As he started to take the curve, Joe felt the car lean sharply and begin to slide.

Flat, sharp curve

to

reduce

speed O

(D

3 9:

He realized he was going too fast. So Joe stepped on the brake to slow down.

- P O I N T OF S U R P R I S E Braking on curve

The car slid off the pavement.

- P O I N T OF NO ESCAPE

I t ran off the shoulder.

-KEY

And lunged into the shallow ditch where it came to a stop.

- F I N A L POSITION

The car was not damaged.

None

B u t Joe scratched his left arm on the broken window handle. He thought nothing of it but stopped a little bleeding w i t h his handkerchief.

Sharp broken part of car

Joe's arm finally swelled. He had a fever. When he got a doctor, it was too late. Infection set i n . Joe died.

Infection

EVENT

Cause of DAMAGE Cause of INJURY

Cause of DEATH

92

The Process of Social-Science

Research

gists t h i n k about mental disease as a functional disorder that results from a person's early learning and life experience, and they study h o w to retrain the i n d i v i d u a l psychologically. Sociologists and social psychologists have studied the various types of home life and environment that lead to mental disease, and perhaps this w o r k teaches us how to reduce mental illness b y reducing the conditions under w h i c h i t arises. Economists have not p a i d m u c h attention to mental illness, b u t i f they d i d they w o u l d assess its economic effect upon the c o m m u n i t y and then study the costs of various programs to reduce mental illness; such studies m i g h t be valuable (see Fein). Each of these levels of investigation and explanation of mental illness can be valuable, and we benefit i f each person-psychologist, sociologist, or biologist—tackles the p r o b l e m w i t h those theories and methods that she knows best. Causation of automobile accidents is still another example. As Figure 6.3 shows, there are many factors that may be considered causal variables or conditions for an accident. The variables that an investigator chooses to study depend upon his discipHne and his ability to manipulate particular aspects of the situation. Engineers can manipulate the design of h i g h w a y and car; therefore they study h o w various designs affect the accident rate. Educational psychologists study methods of teaching auto safety. A n d so on. There is potential value i n studying many of the Hsted variables that enter at various points i n the causal chain. Variation i n b i r t h rates offers a last example of a phenomenon that can sensibly be studied at many levels of explanation. Aside from biological facts, w h y do some families have more children than do others? A t the psychological level, the effects of personal inadequacy, "ego-centered i n ­ terest i n children," and l i k i n g for children have been examined (Kiser & W h e l p t o n ) . Such social-psychological variables as marital adjustment, con­ formity to group patterns, and doubling-up of families have been investi­ gated. Sociological variables have i n c l u d e d religion, nationality, adherence to traditions, social class, education, and their associated values. Variables w i t h an anthropological flavor include the sacred values of a culture, the extent of structural cohesion and authority i n a culture, and the extent of social disorganization ( L o r i m e r , pp. 247-251). A n d , of course, economic vari­ ables, i n c l u d i n g per capita income, laws governing inheritance of land, busi­ ness cycles, and tax laws subsidizing or penahzing large families, have long been considered relevant to b i r t h rates. Some of these sets of variables have p r o v e d more fruitful than have others, b u t each level of explanation has certainly been w o r t h the investigation.

6. S u m m a r y Theoretical variables must be made operational i f empirical research is to take place. The choice of empirical proxies to represent the theoretical ( h y p o t h e t i c a l ) variables of interest is crucial. The empirical variables must be b o t h observable and relevant.

Choosing Appropriate Proxies for Theoretical

Variables

93

I n some cases the choice of proxy may be straightforward because the hypothetical variable is clearly defined (for example, the number of people w a t c h i n g the Olympics on television ) . I n other cases the theoretical variable may be vague and abstract ( for example, happiness ) . The ingredients i n m a k i n g a wise choice of proxy are the usual ones: research imagination, experience, and good judgment. Choosing indepen­ dent variables wisely requires that the investigator understand the material and have sound hunches about the causes of the phenomenon being studied. A n i m p o r t a n t decision is the level of aggregation of the dependent vari­ able. Should one work w i t h individuals, or small groups, or large groups? I f possible, i t is best to w o r k at several levels of aggregation to see whether the results corroborate each other. The independent variables may also be considered at various levels of explanation. To some extent the choice depends upon your discipline and p o i n t of view, b u t the choice also should be illuminated by general scientific intuition.

EXERCISES 1. G i v e e x a m p l e s f r o m y o u r f i e l d of f i v e d e p e n d e n t v a r i a b l e s f o r w h i c h d i r e c t data cannot be obtained but w h o s e conceptual referents (hypothetical variables) are clearly defined. 2. S u g g e s t p r o x i e s f o r t h e v a r i a b l e s y o u g a v e in E x e r c i s e 1 . 3. G i v e e x a m p l e s f r o m y o u r f i e l d of five h y p o t h e t i c a l ( t h e o r e t i c a l ) that are vague and not clearly defined.

variables

4. S u g g e s t p r o x i e s f o r t h e v a r i a b l e s y o u g a v e in E x e r c i s e 3. 5. G i v e a r e a l i s t i c e x a m p l e of h o w a s l i g h t d i f f e r e n c e in a q u e s t i o n m i g h t h a v e a large effect on the result. 6. G i v e a n e x a m p l e of a r e s e a r c h s i t u a t i o n in w h i c h o n e m u s t d e c i d e w h a t l e v e l of e x p l a n a t i o n at w h i c h t o w o r k . 7. G i v e a n e x a m p l e f r o m y o u r f i e l d in w h i c h t h e r e s e a r c h e r w o u l d c o m e u p w i t h an e r r o n e o u s c o n c l u s i o n (the o p p o s i t e f r o m t h e right c o n c l u s i o n ) by w o r k i n g at t h e w r o n g l e v e l of a g g r e g a t i o n , t h a t is, a n e x a m p l e in w h i c h t h e w h o l e d o e s n o t e q u a l t h e s i m p l e s u m of t h e p a r t s .

HELP STUDENTS TO REMEMBER, AND FORGET TOO

Table 4.4 Link the Keywords to People Keyword

The person in my life

Customers Intermediaries

My students whom I teach and so provide them with a service Peter, who is like an intermediary friend in my life because through him I meet many other people Welile, a friend from whom I buy products that I can’t normally find in the shops Stephen, an old guy who once tried (unsuccessfully) to convince my girlfriend to go on a date with him My neighbors in the building where I stay

Suppliers Competitors Society

I will choose five people I know who play roles in my life that are similar to the five components. It is important that I can clearly explain, in a way that makes sense, why I connect each person to each component, otherwise I will not form a strong connection. If I want to remember the components of the market environment, I simply think of the people in the environment around me, and they remind me of the components. The problem with this approach is that it is very personal and is unique to every student. If you want to help all students simultaneously, you could try a more impersonal technique of relating it to something that everyone knows – like body parts. I can choose five parts that are in some way similar roles to the five components. Now if I want to remember the components of the market environment, I think of my own body, and it helps remind me of the components. I have used this technique with many different things, like everyday objects: my car, a pen, a tree, and many more. The more Table 4.5 Link the Keywords to Body Parts Keyword

Body parts

Customers

Parts that have needs I have to satisfy every day, like my mouth or my stomach My hands that act as intermediaries between me and other people, and between me and objects My eyes and my ears that supply me with new information all day long My head and my heart, which are always in competition The clothes that I wear – because they are all around my body like society is all around me

Intermediaries Suppliers

Copyright 2019. Routledge.

Competitors Society

94 EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 3/17/2020 11:25 AM via ST MARYS UNIV AN: 2093482 ; Wentzel, Arnold.; Teaching Complex Ideas : How to Translate Your Expertise Into Great Instruction

HELP STUDENTS TO REMEMBER, AND FORGET TOO

similar the thing is to what you want to try to remember, the more effective this technique is. BOX 4.2 SPATIAL MEMORY METHODS Spatial memory methods are part of the category of mnemonic techniques described in this section. It is an ancient technique, going back millennia (see Kelly, 2017). I mention it in a box because Maguire et al. (2003 p.90) found that, “Superior memory was not driven by exceptional intellectual ability or structural brain differences. Rather, we found that superior memorizers used a spatial learning strategy.” For more information, search for “memory palace” or “method of loci”.

Categorization Sometimes there is already a structure that exists in the ideas that you want to remember, but the textbook does not make the structure clear. If you can find some model or categorization hidden in the ideas, you can make it easier to remember because it reduces the load on your short-term memory. For example, let’s say someone gives you this shopping list to remember: carrots, milk, lettuce, apples, yogurt, banana, cucumber, cheese, and peaches. You will probably forget most of the nine products by the time you get to the shop. But you can reduce the number of items you have to remember to three, and this is much easier. In the list, there are three categories: vegetables (carrots, lettuce and cucumbers); fruit (apples, bananas, peaches); and dairy products (milk, yogurt and cheese). Because of this categorization, there are fewer items to remember at first (only three categories instead of nine products). And now the products are also easier to remember, because each category reminds you of the products. Here is another example. Suppose, in a textbook, the authors list six kinds of strategies a business can follow: ■ ■ ■ ■ ■ ■

Innovation: the development of new products; Concentration: focusing on a single product to market; Market development: existing markets or products are developed more intensively; Rationalization: reducing costs by getting rid of products, assets or labor; Divestiture: selling off parts of the business; Liquidation: closing down the business

95

HELP STUDENTS TO REMEMBER, AND FORGET TOO

Table 4.6 Categorize to Remember Category

Strategy

Strategies that EXPAND business activity

Innovation: the development of new products; Market development: existing markets or products are developed more intensively Concentration: focusing on a single product to market; Rationalization: reducing costs by getting rid of products, assets or labor Divestiture: selling off parts of the business; Liquidation: closing down the business

Strategies that REDUCE the scope of the business Strategies that ELIMINATE business activity

From this, I could see three categories, with two strategies each. It is much easier to remember three categories or two items than having to remember six items. What I saw is shown in Table 4.6. Not all ideas belong to a single inherent category, so I don’t use categorization as often as the other techniques. I prefer visual techniques, which is what I turn to next. Images Regardless of memory style, most people have a near-perfect memory for images that can last for days or weeks. This has been known since Standing (1973) and was recently confirmed by Konkle et al. (2010). Julian de Freitas (2012) explains that they showed: participants a stream of three thousand images… Then, participants were shown two hundred pairs of images – an old one they had seen in the first task, and a completely new one – and asked to indicate which was the old one. Participants were remarkably accurate at spotting differences between the new and old images – 96 percent… despite needing to remember nearly 3,000 images, they still performed almost perfectly. Visual memory techniques exploit this natural and effortless ability by connecting facts to images. I will demonstrate one image-based mnemonic device that has never failed me since my father taught me this technique more than thirty years ago. It involves connecting ideas with related images, and then creating a visual unit from those images. Suppose I want to remember the elements of a firm’s internal control system, which are as follows.

96

HELP STUDENTS TO REMEMBER, AND FORGET TOO ■ ■ ■ ■ ■

Control environment: how the business is managed e.g. philosophy, style and values; Information and communications: systems and reports related to operations and compliance- with internal and external stakeholders; Control activities: policies and procedures to execute an action plan and address risks; Risk assessment: identification and analysis of risk that could prevent the achievement of objectives; Monitoring: checking the performance of the system through monitoring activities or audits.

As usual, the first step is to extract descriptive keywords. There is no right or wrong here – the only thing that matters is whether the keyword reminds you of what the point is about. For me, the result is shown in Table 4.7. Secondly, I attach a symbol, or a very simple picture that is easy to draw, to each keyword. This step does not require drawing ability, simply the ability to come up with a symbol or a really simple picture that makes sense to you. You need to be able to draw it fast and nobody needs to see your pictures, so do not create an artwork here. Again, there is no right or wrong here – the image should simply be able to remind you what the keyword is, or what the point is about. For me, the result was as shown in Table 4.8. Thirdly, integrate all the symbols or images into one image. To integrate the images, they have to be made part of each other and combined so they form one image. Simply putting a border around the images or

Table 4.7 Extracting Keywords Elements to remember

Keywords

Control environment: how the business is managed e.g. philosophy, style and values Information and communications: systems and reports related to operations and compliance; with internal and external stakeholders Control activities: policies and procedures to execute an action plan and address risks Risk assessment: identification and analysis of risk that could prevent the achievement of objectives Monitoring: checking the performance of the system through monitoring activities or audits

Control Information

Policies and procedures Risk Monitoring

97

HELP STUDENTS TO REMEMBER, AND FORGET TOO

Table 4.8 Attach Images to Keywords Control

Information

Policies and procedures

Risk

Monitoring

An angry face of a person trying to control me

A billboard on which people communicate information

A thick rulebook in which you find lots of procedures

An explosion that destroys everything is a big risk

You can monitor what is happening better if you have good glasses

connecting the images with arrows will not work, because the images are still separate, and many images are more difficult to remember than a single image. The images have to be integrated with each other into a single image in a way that you can explain to yourself. It is not necessary for the integrated image to make sense or even look like a nice picture because the mind best remembers pictures when they are strange. The important thing is that the images are all integrated into one image. For me, the result was Figure 4.4. When I have to recall the ideas, I simply draw the integrated image, and the different parts of the image remind me of the ideas I have to remember. There is the angry face (control) with glasses on it (monitoring). Because he is angry, an explosion (risk) comes out of his head. His body is the billboard (information), but this is a billboard that has a book (policies and procedures) as the screen of the billboard. There is no other image like this and it is a bit strange, but what is important is that I can explain how I created it. Combine this strangeness with humans’ strong visual memory, and you have a powerful memory technique. I used to have 20-30 images per subject that I studied in school and university, and I never forgot any of them in an exam situation. Rhymes Rhyming (often combined with rhythm) is an age-old mnemonic device that was used to pass ideas from generation to generation in the absence of writing. With the rhyming technique, you create a structure containing words that will remind you of what you want to remember. Rhymes

98

HELP STUDENTS TO REMEMBER, AND FORGET TOO

Figure 4.4 My Integrated Image

are easier to remember, because if you forget something, you only need to find some words that rhyme with the other words, and that will probably prompt you to remember. The rhyme does not have to be great, it just needs to rhyme and make sense. Again, it starts by extracting keywords, and then putting them into a rhyme that uses these keywords. I created the following rhyme for me to remember the five components of the market environment:

99

HELP STUDENTS TO REMEMBER, AND FORGET TOO

For customers I care But competitors I scare And suppliers they pair With intermediaries to get products there While society is everywhere The rhyme makes sense because a business should care for customers and its competitors would be scared of it. Also, suppliers help the business, often by working together with intermediaries. And society is all around the business. The rhyme is silly, but effective. If you are not good at finding words that rhyme, there are many free rhyming dictionaries on the web that can help you to find rhyming words. If you can add some rhythm to a rhyme (like a rap song), it becomes even easier to remember, because the rhythm makes the memory more automatic. Another possibility is to put the words of a rhyme into a song that the students know very well – I know of many teachers who use this technique very effectively. Stories With this device, you create stories that serve as a framework that you use to link ideas together in an interesting way. Humans’ memory for stories is almost unlimited. You can hear a story once and be able to retell it to someone days later. And we remember thousands of stories without effort because stories naturally make connections. This technique uses this ability. Start by extracting keywords, as always. Let’s take another example, a rather boring and intimidating list of the characteristics of the business environment, with the keyword already extracted, as shown in Table 4.9. Next, I create a story in which I use every keyword. My story eventually was: I used to work in an office with a guy who had three heads. We were never sure what he was saying, because when he spoke all heads would talk and interrupt each other and sometimes they would even say completely different things. For a short time, we could bear it, but, after a while, we had to do something. We put him in a separate closed office where his talking heads could talk as much as they liked for as long as he wanted.

100

HELP STUDENTS TO REMEMBER, AND FORGET TOO

Table 4.9 Extract Keywords Characteristics of the business environment

Keyword

It is a combination of many factors at different levels that need to be studied separately and together. The components are interdependent. When one changes, the others are affected. The environment is uncertain and can change very fast. To cope with changes, the business needs to change too, either proactively or reactively. Businesses are affected differently by the environment – even departments in the same business will experience this. Changes in the environment have both a short-term and long-term effect on the business. Environmental factors have an ongoing effect that is relentless and harsh.

Multi-faceted Inter-dependent Uncertain Action Differences Short- and long-term Ongoing

If you cannot see how this helps, here is the story again, but this time notice how different parts of the story remind me of different keywords: I used to work in an office with a guy who had three heads (multifaceted). We were never sure what he was saying (uncertainty), because when he spoke all heads would talk and interrupt each other (interdependence) and sometimes they would even say completely different things (differences). For a short time (shortterm), we could bear it, but, after a while (long-term), we had to do something (action). We put him in a separate closed office where his talking heads could talk as much as they liked for as long as he wanted (ongoing). Sometimes there are cause-and-effect relationships in a list of facts which you can use to create a story, but, if not, creating a vivid and strange story works just as well. Associations Using mnemonic devices that involve creating associations between the ideas is popular among memory champions and those who specialize in teaching memory improvement. Such devices are based on the principle that we remember things that are vivid (colorful, noisy, full of movement, emotional) and very different from the ordinary. When using associations to remember, you use your imagination to create vivid and absurd

101

HELP STUDENTS TO REMEMBER, AND FORGET TOO

connections between ideas. If the connection is vivid and strange enough, one idea will automatically make you think of the idea to which it is connected, and that idea will make you think of the next one, and so on. The most common device here is the ‘link’ method. To illustrate it, let us try to remember the four types of unemployment. The keywords are simple to extract this time because they are simply the types of unemployment: structural, cyclical, seasonal and frictional unemployment. Next, I come up with a mental image that I associate with each keyword. The result for me was Table 4.10, but every person will have different images. Finally, I associate one word with the next word in as vivid and strange a way as possible. This is the key. There has to be color, movement, sound and funny things happening in each association – if not, you will forget it. My associations are shown in Table 4.11.

Table 4.10 Keywords Attached to Mental Images Keyword

Mental image

Structural Cyclical Seasonal Frictional

The Eiffel Tower A bicycle Sun and clouds Rubbing things together

Table 4.11 Linking the Mental Images Keyword

Mental image

Structural Link between Eiffel Tower and bicycle

The Eiffel Tower I imagine that the Eiffel Tower has hands and feet and it gets on to a giant bicycle and rides it, but all the time it falls off and so crashes into the buildings of Paris, sending people screaming and running away. A bicycle I imagine a cloud riding on a bicycle, but this bicycle’s wheels are two suns; and so, as the cloud rides the bicycle through the streets, people collapse from heat and shield their eyes as the bicycle passes them. Sun and clouds I imagine the sun grabbing two unruly clouds who are running across the sky and then hitting and rubbing them together to discipline them, and this makes the clouds cry loudly and their tears fall like rain. Rubbing things together

Cyclical Link between bicycle and sun and clouds

Seasonal Link between sun and clouds and rubbing

Frictional

102

HELP STUDENTS TO REMEMBER, AND FORGET TOO

Now, if I want to remember the four types, I just need a way to link unemployment to the first keyword (structure). I could either think about how unemployed people need to build structures to have jobs (but this is not vivid enough, so I will probably forget it). Much better would be to imagine lots of unemployed structures or buildings, which, instead of standing upright, are lying down on the streets and snoring because they are unemployed. So, when I think of unemployment, I think of sleeping buildings (structural unemployment), and when I then think of structures I think of the Eiffel Tower on a bicycle. The bicycle makes me think of cyclical unemployment, which reminds me of the cloud on a bicycle with sun-wheels and this makes me think of seasons and seasonal unemployment… and so on. Through vivid associations, one idea makes me think of the next idea. Because it draws so much on imagination, this method can be quite engaging, and it was the secret behind Solomon Shereshevsky’s astounding memory (Johnson, 2017). BOX 4.3 MEMORY PEGS A popular variation on the link method is a mnemonic device where the images, and their sequence, are pre-determined. It is extensively used in books like Master Your Memory by Tony Buzan (1998). To learn it and see some great examples, search for “memory pegs” or “memory peg system”.

Memory through Repetition One of my teachers used to say that, “Repetition is the mother of learning.” He implied that memorization is learning and that the most effective way to remember is to repeat. As explained in this chapter, this is a simplistic view is contradicted by the research. Millions of students who spent nights rote-learning and repeating lists of facts, and then immediately forgetting them, can attest to the ineffectiveness of mere repetition and the misery it causes. Yet, there is a kind of repetition that is quite effective, and surprisingly effortless. It is called ‘spaced repetition’ and also requires a degree of forgetting in order to work well. Since it is related to the so-called ‘testing effect’, it will only be discussed in Chapter 7, where I look at the principles of assessment. This completes a rapid tour of some highly effective memory techniques. Lastly, I just want to identify some of the things of which you need to be careful.

103

HELP STUDENTS TO REMEMBER, AND FORGET TOO

Take Care! When using the memory techniques above with information that will have to study more than one time, you should make notes about the technique you used and how you used it. That way you can very quickly revise the work with less effort. For example, if you used the image technique, do like I did above; put the relevant image next to each idea and note down a short explanation of the reason for the image, and then draw the integrated image below the ideas with a short explanation. This way, you can quickly remember the ideas even if it was months or years since the last time that you looked at the image. There are also some mnemonic devices that I would advise you to avoid or use as little as possible, because they make very weak and lifeless connections. These are techniques that involve text, and because the connections involved are not vivid or strange enough, you will often forget them under pressure. For example, if you want to remember the five components of the market environment (customers, competitors, intermediaries, suppliers, society), you can take the first letter of each word and get ‘CCISS’. This looks simple and fast to do, but it is easy to forget. If you have more than two or three of these letter-words, you start forgetting them. And worse, with too many you will also start forgetting what the letters stand for. The same applies to using the letters to make sentences like this one: “Cute Clerks Inhaled Smoke Sadly”. The letter-sentence suffers from the same problem – if you have more than two or three of them, you will forget what some letters stand for because humans don’t form natural connections to text.

MEMORY AS DISTRIBUTED COGNITION Except for the making of connections, the other feature of mnemonic devices is that they do not limit memory only to the brain. These devices spread the cognitive work involved in memory to physical objects, as in analogical methods, or imagined phenomena (like pictures or stories). This is also known as ‘distributed cognition’ and explains why mnemonic devices make remembering so effortless once they are mastered (more about this in Chapter 9.) In a fascinating book, Lynne Kelly (2017) explains how pre-literate cultures used distributed memory to achieve extraordinary memory feats. Just one example she mentions is that of the Matsés people in South America. When their memorized knowledge only about medicinal plants was documented, the result was a 500-page document! Every one of the mnemonic devices in this chapter (and many more) has been used

104

HELP STUDENTS TO REMEMBER, AND FORGET TOO

extensively in pre-literate cultures, forming the foundation of a densely interconnected and multi-modal knowledge of plants, animals, climate, genealogy, and more. What her book shows wonderfully is how distributed memory, when it serves understanding, can be used to build flexible knowledge structures that are easy to complexify. It also accelerates learning in any area because it makes it easier for the next generation to build on the previous generation’s knowledge. Richard Feynman would have agreed. He (1963) was vehemently opposed to memorization without appropriate connections, but he acknowledged that memory plays a critical ‘timebinding’ function for humans (Feynman, 1999, p.184).

CONCLUSION There are few people that have anything close to a photographic memory. Many do have astounding memories, but in the majority of these cases, they developed their memories using mnemonic devices. This chapter showed that these devices can be demonstrated and taught to students. Doing so will vastly reduce the effort required for memorization. However, this is not an excuse to expect students to memorize the many useless categorizations and bullet point lists in textbooks. Memory is tightly coupled with understanding, and should be subservient to it. Even in the rare cases when memorization is necessary, it should be like tape that holds something in place until the glue of understanding can make it stick. BOX 4.4 STICKY TEACHING There is another kind of way of making ideas both memorable and interesting that straddles the techniques in this and the previous chapters. This is best explained by Chip and Dan Heath (2007) in their very useful book, Made to Stick, which I can highly recommend to teachers. Also see their free resources, which you can find by searching Heath AND “teaching that sticks”; and their methodology called “anchor and twist”, which you can find by searching for Heath AND “anchor and twist”. Changing one’s view of memory, from store-and-recall to connect-andforget, has a dramatic impact on one’s teaching, and especially assessment (see Chapters 7 and 8). Memorization should be tested only for those ideas that will promote future understanding. If someone with a photographic memory can pass your assessment, critically consider your view of memory.

105

HELP STUDENTS TO REMEMBER, AND FORGET TOO

EXERCISES 1. Analyze a chapter in a textbook for which you already identified the critical ideas. a) Are there any foundational ideas? What are they, and how can you help students to remember them? To which critical ideas will you connect them? (If this is not clear, reconsider your answer.) b) Are there any key details that need to be memorized before they are understood? If any, what are they, and how can you help students to remember them? To which critical ideas will you connect them? (If this is not clear, reconsider your answer) c) Are there any over-complex ideas that need to be treated like facts for now? If any, what are they, and how can you help students to remember them? To which critical ideas will you connect them? (If this is not clear, reconsider your answer.) d) Which ideas are worth forgetting? This may be a substantial part of the chapter. e) If your assessment should include some memorization, which memorized facts are the most appropriate to assess? 2. Identify at least three lists of ideas or terminologies that students need to memorize, but find difficult to remember. a) Apply a different mnemonic device to each one. b) How would you teach and demonstrate each device to students? 3. Consult Box 4.1. Which other mnemonic techniques not explained in this chapter do you think will be effective in your course? Look specifically for techniques that can help students remember the definitions of key terms in your subject.

REFERENCES Bain, K. 2004. What the Best College Teachers Do. Cambridge: Harvard University Press. Bartlett, F.C. 1920. Some experiments on the reproduction of folk-stories. Folklore, 31(1):30–47. Borges, J.L. 1964. Funes the Memorious. In: Labyrinths: Selected Stories and Other Writings. Translated by J.E. Irby. D.A. Yates & J.E. Irby (eds). New York: New Directions. Buzan, T. 1998. Master Your Memory. London: BBC. Caine, R.N. & Caine, G. 1991. Making Connections: Teaching and the Human Brain. Alexandria: Association for Supervision and Curriculum.

106

HELP STUDENTS TO REMEMBER, AND FORGET TOO Carmichael, L.C., Hogan, H.P. & Walter, A.A. (1932). An experimental study of the effect of language on the reproduction of visually perceived form. Journal of Experimental Psychology, 15:73–86. Chase, W.G. & Simon, H.A. 1973. Perception in chess. Cognitive Psychology, 4:55–81. Davis, R.L. & Zhong, Y. 2017. The biology of forgetting – a perspective. Neuron, 95:490–503. De Freitas, J. 2012. Why is memory so good and so bad? Explaining the memory paradox. Scientific American, 29 May [available at http://www. scientificamerican.com/article/why-memory-so-good-bad, accessed 18 July, 2018]. De Groot, A.D. 1965. Thought and Choice in Chess. The Hague: Mouton. Fernández, J. 2015. What are the benefits of memory distortion? Consciousness and Cognition, 33:536–547. Feynman, R.P. 1963. The Problem of Teaching Physics in Latin America. First Inter-American Conference on Physics Education, Rio de Janeiro [available at http://calteches.library.caltech.edu/46/2/LatinAmerica.htm, accessed 24 July, 2018]. Feynman, R.P. 1999. The Pleasure of Finding Things Out: The Best Short Works of Richard P. Feynman. Cambridge: Helix Books. Heath, C. & Heath. D. 2007. Made to Stick. Random House. Hemmer, P. & Steyvers, M. 2009. A Bayesian account of reconstructive memory. Topics in Cognitive Science, 1:189–202. James, W. 2007. The Principles of Psychology, Volume 1. New York: Cosimo Books. Johnson, R. 2017. The mystery of S., the man with an impossible memory. New Yorker, 12 August 2017. Kelly, L. 2017. The Memory Code: The Secrets of Stonehenge, Easter Island and Other Ancient Monuments. New York: Pegasus. Konkle, T., Brady, T.F., Alvarez, G.A. & Aude, O. 2010. Scene memory is more detailed than you think: The role of categories in visual long-term memory. Psychological Science, 21(11):1551–1556. Maguire, E.A, Valentine, E.R., Wilding, J.M. and Kapur, N. 2003. Routes to remembering: The brains behind superior memory. Nature Neuroscience, 6(1):90–95. Nørby, S. 2015. Why forget? On the adaptive value of memory loss. Perspectives on Psychological Science, 10(5):551–578. O’Loughlin, I. 2017. Learning without storing: Wittgenstein’s cognitive science of learning and memory. In: M. A. Peters and J. Stickney (eds.). A Companion

107

HELP STUDENTS TO REMEMBER, AND FORGET TOO to Wittgenstein on Education: Pedagogical Investigations, pp. 601–614. Singapore: Springer. Richards, B.A. & Frankland, P.W. 2017. The persistence and transience of memory. Neuron, 94:1071–1084. Schacter, D.L. 2001. The Seven Sins of Memory: How the Mind Forgets and Remembers. New York: Houghton Mifflin. Sloman, S. & Fernbach, P. 2017. The Knowledge Illusion: Why We Never Think Alone. New York: Riverhead Books. Standing, L. 1973. Learning 10000 pictures. Quarterly Journal of Experimental Psychology, 25(2):207–222. University College Cork n.d. George Boole: Computer Science [available at https:// georgeboole200.ucc.ie/boole/legacy/computerscience, accessed 22 July, 2018]. Weber, B. 2009. Kim Peek, inspiration for ‘Rain Man’, dies at 58. New York Times, 26 December 2009. Whitehead, A.N. 1948. An Introduction to Mathematics. Oxford University Press. Willingham, D. 2009. Why Don’t Students Like School? San Francisco: Jossey-Bass.

108

Chapter Five

How to Make Boring and Complex Ideas Interesting

Almost fifty years ago, Elizabeth Loftus and John Palmer showed a movie to some students and changed the way we understand memory. The movie was just a few seconds long and showed a car accident. After the movie, the students, all of whom had seen the same movie at the same time, were divided into groups and asked how fast the cars were going. The differences between their answers were quite big. For example, one group estimated it was 51 km/h, while another thought it was 66 km/h (Loftus and Palmer, 1974). How could people remember the same event so differently? The answer supports what the previous chapter said about memory: that we recreate memories from the associations we make at the moment the memory is formed and at the moment that we remember. Since the students had all seen the same movie, the associations they made during the movie could not fully explain the big difference. So, what was it? There were also the associations made at the moment of remembering. This occurred when students were asked about the speed of the moving car. Maybe they were asked the same question but in different words? That is indeed what happened. The researchers changed just one or two words when they posed the question to every group. But is it really possible to manipulate a recent memory so dramatically simply by changing one word? Do you think that I would be able to change your memory of something as easy as that? Well, yes, it seems so. One group was asked, “How fast were the cars moving when they hit each other?” For another group, the word ‘hit’ was replaced with ‘collided with’, for another, it was replaced by ‘smashed into’, for another, it was ‘contacted’, and, for another, ‘bumped into’. Think for a moment which group guessed the car was going at 51 km/h and which thought it was 66 km/h. The groups who heard the question with the words ‘collided with’, ‘bumped into’ and ‘smashed into’ estimated faster speeds, while those who heard ‘hit’ and ‘contacted’ estimated slower speeds. The highest

109

MAKE BORING AND COMPLEX IDEAS INTERESTING

estimate came from ‘smashed’ and the lowest from ‘contacted’. Clearly, the associations students made with the words influenced how they remembered the event; sometimes even remembering things that were not there. After that, we could no longer think the same way about eyewitness testimony and the way lawyers ask questions in court. Elizabeth Loftus, in fact, became an expert witness in criminal cases. Her first case was one in which a woman had killed her abusive boyfriend, where witnesses could not agree on how much time had elapsed from the time she picked up the gun until she fired it. The difference was crucial because it would determine if it was murder or self-defense, but witnesses could not agree on whether it was seconds or several minutes. Loftus testified to the unreliability of memory, and the woman was acquitted. Her work has led to more acquittals, making her many enemies who believed that she helped guilty people go free. But her response is, “I haven’t had a situation where someone was acquitted because of my testimony and then went on to commit some awful crime” (Costandi, 2013, p.269). The possibility that innocent people may be imprisoned or receive the death penalty weighs more heavily on her, so she has been campaigning for legal reforms for decades. As a result, in some courts in the USA, jurors now have to be informed of the unreliability of eyewitness testimony. It makes you wonder how many of your own memories have been subtly manipulated by others perhaps even whether or not you can control your students’ minds simply by using different words. Hopefully, some parts of this story were interesting to most of you. This may be because, having read the previous chapter on memory, it is relevant to you. In fact, think back to Chapter 4 and identify any other things about memory that struck you as interesting. My guess is that most of you will choose the idea that forgetting is critical to a good memory or perhaps the statement that there is no such thing as photographic memory. But before continuing, consider these ideas, and ask yourself: “What made them interesting?” The story of Elizabeth Loftus, and whatever other ideas you identified, have a couple of things in common that make them interesting. In this chapter, we will identify them by analyzing what makes ideas interesting and applying it to teaching.

WHY SHOULD IDEAS EVEN BE INTERESTING? On the surface, it seems that we want to make ideas interesting to ensure that students pay attention. Ideas to which nobody pays attention literally do not exist. As professors, we do not want this to happen to us

110

MAKE BORING AND COMPLEX IDEAS INTERESTING

because it means not only that our students will forget the ideas we taught, but also that they will forget us. But this is a poor reason to make things interesting. As with memory, interest serves understanding by enhancing memory and motivation. If something is interesting, we think about it. The act of thinking creates new connections and strengthens existing connections, and it is only the ideas we think about that we remember (Willingham, 2009). In addition, interest often generates an emotive response, and, if thinking is accompanied by emotion, it further enhances the memory of the ideas involved (Tyng et al., 2017). Of course, by inducing thinking, interest also makes understanding possible. Understanding is not achieved by merely storing more facts; rather it involves connecting facts and ideas, thereby changing the knowledge network structure (also called ‘conceptual change’). The problem is that conceptual change is always disruptive. All new ideas need to be connected to existing ideas, which include ones they acquired during the course but also preconceptions that students brought with them to the course. Whenever we introduce new ideas, it disrupts the structure and organization of these existing ideas (DiSessa, 2014). Disruption is often resisted because it takes energy and suggests that we are at least partly wrong. But interest can remove much of the resistance and make us more willing to incorporate new ideas into our knowledge structure. For this reason, a person who is interested in a topic will not wait for a professor to introduce disruptions, but will independently search for new patterns to make sense of incoming information. In sum, then, when something is interesting, it is more likely that students will want to broaden their understanding.

INTERESTINGNESS DESCRIBED IN THREE ‘F-WORDS’ It is difficult to completely define what makes an idea interesting because it is multi-dimensional. The three dimensions that I found to be important in teaching can be captured in three words: ‘fascination’, ‘fun’, and ‘fumbling’. The most obvious definition of ‘interesting’ is that of something that holds attention. This is best captured by the idea of ‘fascination’ that Sally Hogshead (2010) writes about. She explains that word comes from the Latin word ‘fascinare’, which means ‘to bewitch’. When you are fascinated by something, your attention locks onto it, and you are unable to move onto to something else. If we can make something relevant and attractive to the students, they are likely to find it fascinating. Fascination is an important first step in generating interest because our students will not be motivated to learn what we explain to them

111

MAKE BORING AND COMPLEX IDEAS INTERESTING

unless they are motivated to pay attention. But, by itself, fascination is of limited use in learning and teaching situations. We have all watched television documentaries or attended flashy presentations that captured our attention, but, one day later, we remember little. This is because simply paying attention does not translate into learning unless we actively use that attention – unless we think. This is where the game designer, Raph Koster (2005), made a great contribution with his ‘theory of fun’, which adds the second dimension. He defines ‘fun’ as that which we experience when we find and master new patterns of thought. This is exactly what happens in games. To master a game, we have to figure out certain patterns, and once we have done this, the game is no longer as much fun. However, if we cannot see any patterns, the game is too difficult and definitely not fun. With a new game, we initially struggle as we try to figure out the game and how to play it. Over time, we figure out the patterns and eventually master the game. Games tend to be the most fun as long as they are pushing us closer to the edge of our capabilities, while simultaneously giving us a sense that mastery is a real possibility. Like games, to gain an understanding of a subject or topic, we need to find patterns. If we are fascinated with the topic, we will have fun looking for patterns. Finding these patterns helps us to see connections and chunks, compress the information and try out the ideas to see how well we can generate new information. Once we find all the patterns and become competent in using them, we have mastered the subject and it becomes easy. However, there is a problem. Once we have mastered a game, we keep on playing simply to experience the pleasure of winning, even if we are no longer learning new patterns. Similarly, in a subject, the patterns become sedimented once they are automatic and part of our long-term memory. At this point, we no longer need to think much and no new learning occurs. It is difficult to switch to learning a new game or a new set of ideas, so our brains resist it, preferring the easy path of just repeating old patterns. As Willingham (2009) argues, the human brain prefers following old patterns over thinking, even though we enjoy thinking once we get into it. In a learning situation, something has to force us to be open to a new idea by upsetting our old ideas and making us fumble. This means that old ideas must be disrupted so that mental cracks can appear through which new ideas can slip in. Murray Davis (1971) elaborates on the third dimension of interestingness as disruption. He recognized that ideas are interesting when they are relevant and not too obvious to your audience (even if they are

112

MAKE BORING AND COMPLEX IDEAS INTERESTING

obvious to you). Ideas that an audience can use are relevant to them. But if the ideas are relevant but completely obvious, they are boring. If an idea disrupts us too much, we call those ideas ‘absurd’. Obvious ideas are patterns that we already figured out (boring) and absurd ideas are ideas where the patterns do not make sense to us (also boring). An idea is interesting when it is useful and lies somewhere between being obvious and absurd. The ‘sweet spot’ of interestingness is shown in Figure 5.1. In a dynamic learning situation, we have to push back against the obvious and disrupt students’ understanding by upsetting what the students thought they knew. Davis makes the point that something is interesting if it denies or contradicts at least one thing that the audience assumes is true. Note the use of the word ‘audience’ – students will pay attention again if an idea disrupts something that they knew. If it disrupts your knowledge (or that of other experts), it will not necessarily be interesting to students, so it is important to know your audience before trying to disrupt them. In summary, interestingness is a process that arises from the interaction of three things: fascination (getting attention), fun (using attention), and fumbling (disrupting attention). The three processes are interdependent, so if any one of them is deficient, it will harm the effects of the other two. This interaction can be expressed by multiplication: Interestingness ¼ Fun  Fascination  Fumbling ¼ F3 The three F-words give us the F3 (F-cubed) approach to giving ideas an interesting quality. As Figure 5.2 shows, all three support each other and lead to an ever-upward spiral of learning if used together. Relevant Interesting

Obvious

Absurd

Irrelevant

Figure 5.1 The ‘Sweet Spot’ of Interestingness Source: Derived from Davis (1971)

113

MAKE BORING AND COMPLEX IDEAS INTERESTING

Potentially boring information

Get them fascinated

Let them have fun

Ideas understood

Make them fumble Figure 5.2 The Process of Interestingness

Figure 5.3 further shows that generating interest is a continuous process. At first, fascination draws the student in to pay attention to an idea, but this does not require a large investment of cognitive energy. The student then engages with the idea, which requires a lot more cognitive energy. As the student masters the idea, she uses less cognitive energy, and a disruption is needed if the knowledge structure is to be further complexified and refined. To deal with such a disruption takes more cognitive energy, but it prepares the student to jump to a higher level of understanding. Returning to the story at the start of this chapter, it is possible to explain why (or not) it may have been interesting to you. Having just finished a chapter about memory, you will hopefully have recognized Rising level of understanding

Thinking energy

Fascination

Fun

Fumbling Fascination

Fumbling

Fun

Figure 5.3 Interestingness as a Continuous Process

114

MAKE BORING AND COMPLEX IDEAS INTERESTING

how relevant the topic is to you and your practice as a teacher. This should have generated at least some degree of fascination. If you were ever part of a court case or found yourself controlled by someone else, then even more so. The study was presented as a puzzle – both in terms of the cause of the different views and in terms of how it may be useful – and this required finding a pattern. I asked some rhetorical questions to encourage you to look for patterns yourself. If the pattern I revealed was not obvious or known to you, then reading the story would have been fun. If you understood the previous chapter, the ideas would not have surprised you so much that you would think of them as absurd. Lastly, I wanted to make you fumble a bit. Hopefully, it upset your ideas about memory by illustrating how easy it is to manipulate it. If I chose to continue with how one can go about fixing this problem, or how you can use this to your advantage, you would have been more open to it than before.

USING THE THREE F-WORDS TO GENERATE INTEREST Taking each F-word in turn, let us see how to use them individually before considering how they work together. Fascination with Patterns When I first started job hunting after university, I briefly considered a profession that specializes in fascination: sales. During this period, and later, I was exposed to some really good salesmen. They taught me something simple about fascination – that if you want to get people’s attention when selling something, you have to “sell the sizzle, not the steak”. When someone wants you to eat in their steakhouse, they won’t entice you by showing you the raw steak and explaining to you how nutritious it is, or by telling you from what kind of quality cow it is, or by showing you how juicy it is. That is more likely to put you off, even if you are not a vegetarian. Rather, this person will simply throw the steak on the griddle, so that you hear the sizzling sound and have the smell of grilled meat reach your nose as you suddenly realize how hungry you are. If you want to sell something, do not sell the thing itself. It is the benefit of the thing that gets attention, or even better, the benefits of the benefit. When we teach, we are also selling our ideas in exchange for students ‘paying’ attention. We need this ‘payment’ for the learning transaction to begin, and by offering opportunities for fascination, we make students willing to pay attention.

115

MAKE BORING AND COMPLEX IDEAS INTERESTING

Fascination does not happen by talking about the steak or the idea itself, no matter how fascinating it is to us. It happens when we are able to connect the idea to our audience in a way that they can see that it is relevant to them. This means that you need to get to know your audience, what their goals are and what they care about. For example, if I teach someone about investing in the stock market, I will not fascinate people by telling them what shares are, how to buy and sell them, about financial statement analysis or the role of brokers – they will lose interest. I am more likely to succeed if I talk about the benefits of investing, such as increasing one’s wealth. But I am even more likely to succeed if I connect to something deeper inside of them, by showing them the benefit of the benefit (or the meta-benefit). The benefit is gaining wealth, but what is the benefit of gaining more wealth? Maybe, if you gain wealth, you can increase your status or help your friends and family. Maybe it simply means you can finally be independent, doing what you want when you want and never having to ask anyone for anything. The deeper the idea connects to something that matters to a person, the more fascinating it will be to them. Sally Hogshead (2010) explains that there are seven meta-benefits, which she calls ‘triggers’ of fascination and they are summarized in Table 5.1. You pull a trigger by making students feel the emotion described by that trigger and showing how the idea you want to explain can help to relieve or intensify the emotion. Knowing your audience’s values or unquestioned assumptions about life will help you to select the best trigger. If you are addressing a group of people with different values, it means that you often need to pull more than one trigger. For example, when I used to teach Economics to privileged students in South Africa and I got to the topic of poverty, I could pull any one of the above seven triggers and get them fascinated with varying degrees of Table 5.1 Seven Triggers of Fascination Trigger

Description

Pleasure Mystique Alarm Prestige Power Vice Trust

Good feelings, sensual experiences, anticipation of pleasure Puzzling, unanswered questions, being part of a secret Fear, loss of possibilities, respond now Achievement, getting ahead of others, respect, admiration Control, command over others or over the environment Rebel against rules, being different Comfort, certainty, predictability

Source: Derived from Hogshead (2010)

116

MAKE BORING AND COMPLEX IDEAS INTERESTING

success. I summarize them below (rather crudely) in order of effectiveness, as I perceived it, for this particular audience. ■

■ ■









Alarm: Poverty is rising and has led to revolutions worldwide where wealthy people have become targets and lost their possessions and position. This is likely to happen here unless we start doing something, and for that, we need to understand the dynamics of poverty. Trust: We don’t need to worry about poverty as long as we know how to address it. Power: The best way to gain power is to appeal to the majority, and the majority in the country is poor. If you understand poverty better, you have a better chance of gaining political power. Prestige: People admire those who don’t just look out for themselves, but who are seen to care about issues of human suffering, as many celebrities appear to. To know how to do this, you need to understand how the poor live and what they really need. Vice: Your parents want you to live the same life as theirs by getting you to think as they do and remain cocooned in your world. But you need to learn to think for yourself too – there is a bigger world out there where people are not rich, and where you can live a more meaningful life by getting involved with real issues like poverty alleviation. Mystique: Any number of puzzling questions from the news at the time relating to poverty was useful here. For example, why has the poverty rate risen even though the country became increasingly democratic? Or why did poverty rates fall in some South Asian countries that clamped down on democratic rights? Pleasure: Though I never pulled this trigger, I could probably have done it by means of a pleasant field trip to a really poor neighborhood and have them experience poverty in a limited way.

If my audience had been a different one, I would have pulled different triggers in a different way. Not all of the triggers I pulled were socially desirable (the power trigger perhaps), but my purpose was to attract their attention so that I could get them thinking beyond the triggers. The other two Fs helped with that. Looking at the story at the beginning of the chapter, you may see how I tried to pull a few triggers. Initially, by not revealing the reason for the different answers immediately, I used the mystique trigger. By making you think about the possibility of this knowledge

117

MAKE BORING AND COMPLEX IDEAS INTERESTING

being abused, I also pulled the alarm trigger; and then, by mentioning that you might employ this knowledge to your own benefit, I pulled the power trigger. Fun Figuring Out Patterns Ideas are patterns that need to be figured out, but fortunately, the human brain is naturally good at seeking and detecting patterns (Caine & Caine, 1991). The experience of fun is evolution’s way to get us to enjoy making sense of apparently random incoming information and compress it into something more useful. Fascination draws us into this process and motivates us to engage in figuring out patterns. Part of the fun is figuring out the idea itself, and another part is figuring out how to use it in order to make sense of a reality that seems to be random, noise or a bunch of disconnected facts. People find history less interesting if it is “just one damn thing after another” and more interesting if they can see a story, a repeating pattern or a conspiracy theory that holds it all together. Similarly, once you understand a theory, such as elite theory in political science and sociology, the news is no longer just a series of random events but will start to cohere around an organizing principle. Each idea is a new pattern that allows us to see the world differently and organize our understanding differently. For example, before I learned about the market mechanism (demand and supply) in economics, I was aware of price changes. But, until then, it seemed to me to be just a lot of things happening with no clear meaning. Learning about the market mechanism allowed me to see patterns in these changes, understand what caused them and even make some accurate predictions. With this knowledge, I could even understand things with which I had no direct experience, like the price changes of crude oil, gold and currencies. However, as with games, all of this is only fun for those engaged in the process, not for passive observers. This implies that one needs to encourage participation. This can be done inside a lecture (e.g. through discussions, quizzes) or outside (e.g. performing authentic tasks). It can be done in groups (e.g. through various kinds of cooperative learning), by talking, individual writing or even just by thinking (e.g. through using advance organizers or simply pausing for a few seconds after asking a question). It is not surprising then that Bain (2004) found that the best professors do not only rely solely on lectures, but encourage active learning as well. Box 5.1 contains some information about active learning methods.

118

MAKE BORING AND COMPLEX IDEAS INTERESTING

BOX 5.1 ACTIVE LEARNING To find more information about active learning, simply enter in the search box “active learning methods” AND classroom, or, to be more specific “active learning” AND classroom or even “active learning” AND “college classroom”. An extreme method is the flipped classroom, in which there is no lecturing during lectures, only activities. For information, just search for “flipped classroom”. Since ideas change the way we look at and experience the world, they are not always easy to figure out, which is why we need explanations that enable us to think about, and with, these ideas. Explanation speeds up the mastering of a pattern as long as it leads to new insights. Ideas, such as Pavlovian conditioning or Bayesian probability, are fun as long as you discover new ways of using them to explain things. Eventually, they become obvious to you, and using them will be automatic and much less fun. For something to remain fun, it has to push you to the limits of your ability. This is similar to Mihaly Csikszentmihalyi’s (1990) idea of ‘flow’. We are in a state of flow when we are fully engaged with what we are doing. It is a dynamic process that can only be maintained if the level of challenge rises as we become more skilled (see Figure 5.4). We remain in the flow channel as long as our skill level and the challenge we perceive are more or less in balance. When our skill level exceeds the perceived degree of challenge, we get bored, and when the challenge exceeds our skill, we become anxious or frustrated and give up. So, once people are fascinated by an idea, we can make it fun by presenting a series of varied challenges that match their ability. These

Sense of challenge

Anxiety

Boredom Level of skill

Figure 5.4 Keeping It Fun Source: Adapted from Csikszentmihalyi (1990, p74)

119

MAKE BORING AND COMPLEX IDEAS INTERESTING

challenges ideally involve active learning and may be as simple making sense of recent news events or an everyday phenomenon. For this, authentic scenarios or case studies work well. It is important to start out with challenges that are easy – maybe simple scenarios where the application of the idea is obvious, and then progress to more challenging tasks – like generating new applications of the ideas themselves. For example, when I teach Behavioral Finance (psychology applied to financial markets) I present ideas separately with simple applications to financial markets and progress to more surprising applications like religion and romantic relationships. Then, I integrate the ideas by showing them examples from my own life where someone fooled me by those using the ideas in clever combination. Below is such an example: The other day I went to do some shopping when an attractive young lady walked up to me and asked if I knew about the Dead Sea. She talked about how the Dead Sea is associated with all sorts of health benefits. She then proceeded to sell me beauty products made of minerals found in the Dead Sea, even though I had no use for them. This is how she did it. As I answered her questions about whether I have a wife, girlfriend or mother, she gently took my hand and started to demonstrate the nail care product. She was done in three minutes and the result was a surprisingly smooth and shiny thumbnail. I was still admiring my nail when she started showing me all the products in the nail care kit, and every time she showed it to me, she asked me to hold it. Very soon I was holding all the different products in the kit. She told me that the kit would last me for eighteen months. I was informed that this kit would sell in a nearby shopping mall for the equivalent of $100, but that she was selling it at a special price of $37. As she showed me her invoice book I could see that most of the pages were filled out by previous customers. She pointed out how the majority of them actually bought more than one nail kit and added that I would get a bottle of some lotion for free with every kit I bought. While she was filling out my invoice for the nail kit, she kept on repeating further reasons for why I made a good decision. She asked who she should make out the invoice to: “Mr. Handsome, Sexy or Gorgeous?” and made me feel like I was clearly a wellinformed buyer (which I obviously was not). But if I thought she was done with me, I was mistaken. She simply assumed that I would pay by credit card (this way she could sell me even more products) and started demonstrating yet another

120

MAKE BORING AND COMPLEX IDEAS INTERESTING

product. If it were not for the fact that I was late for an appointment, I probably would have bought that too. It took much effort for me to directly contradict her several times and point out that she was making assumptions that were not correct. After a few minutes, I extricated myself, paid the $37 cash and left. I felt I did quite well, but later realized I could have bought the same nail care kit for less than half the price elsewhere. I gave the nail kit to my girlfriend and it didn’t even last one day. I ask my students to spot all the behavioral biases to which I fell prey (quite a few). Finally, I ask them to find such examples in their own life and elsewhere (for example, by investigating companies’ annual shareholder reports). Their own examples then become case studies in the following years’ courses. While I will address this in more detail in later chapters, I want to touch on the use of formative assessments in making and maintaining interest, because it is one of the easiest ways to keep students within the flow channel. Formative assessments consist of tasks that enable students to develop their understanding – such tasks can be informal in-class activities or challenging homework assignments, and they count very little toward students’ grades. In contrast, summative assessments aim to evaluate students’ performance and are usually high stakes tests or exams that determine if a student passes a course or not. Since fun requires relaxation, summative assessments are much less fun. An authentic task is a problematic real-life scenario with enough detail so students can imagine it vividly, and in which they have to use their understanding to achieve something. The outcome is presented (not necessarily in writing) as some kind of performance or product. Authentic tasks encourage students to apply the patterns they are learning and see the relevance of the ideas, and are commonly used by good professors (Ambrose, Bridges & DiPietro., 2010, Bain, 2004). Compare the nonauthentic and authentic tasks in the next table. While students might not like to do any of the tasks, they will find the authentic one more interesting and think more deeply about the ideas. Also, it is easier to add layers of increasing difficulty to authentic tasks as students’ abilities improve. The non-authentic tasks (with the exception of the calculation question) do not require thinking – only mentally copying and pasting from the textbook. My own experience over the years has been that students learn far more from authentic tasks than from my explanations. To construct an authentic task, a simple procedure is to imagine a scenario with the five elements, as shown in Table 5.3 (see Wiggins & McTighe, 2011,

121

MAKE BORING AND COMPLEX IDEAS INTERESTING

Table 5.2 Two Kinds of Tasks Non-authentic tasks

Authentic task

Define inflation. Where do we find information on the inflation rate? If the CPI in 2015 was 105 and the CPI in 2016 is 116, calculate the inflation rate in 2016? Explain why the expected inflation rate is important to companies. List and discuss the negative effects of inflation.

It is that time again – wage negotiations for ACME Chairs, a company in Cape Town specializing in making chairs (using mainly unskilled labor). Last year, the company’s wage negotiators really messed up – they offered a wage increase that was too low. As a result, the company suffered a damaging month-long strike. This year, they don’t want to make the same mistake again, so they called you in to offer expert advice. Find out what is the current and expected inflation rate in South Africa and make a recommendation to the negotiators. Your recommendation should be written in less than one page. In it, you should suggest the proper wage increase and explain two arguments in support that the negotiators can use when negotiating with the labor union.

Table 5.3 Elements of an Authentic Task Elements

From the example (Table 5.2)

Topic A detailed real-life scenario Problem for someone Role for the student Product or performance

Inflation Details about the company and its mistakes Impending strike due to bad economic advice Economic advisor to make a good recommendation In the form of a written recommendation

Module G), or to take an actual event or adapt it so that it contains the five elements. In the example, my topic was inflation, where I imagined a company that had a problem because they didn’t understand inflation. In this scenario, the student plays the role of an advisor who does understand inflation and presents his advice in the form of a report. However, a report is not the only way to have presented it – it could have been a role play, case study, verbal presentation, and many more. With enough challenges, one’s skills will grow, but unfortunately, we tend to prefer the boredom in the zone of mastery and may even resist new ideas that challenge us further. However, in a changing environment,

122

MAKE BORING AND COMPLEX IDEAS INTERESTING

resisting new ideas is bound to cause stagnation. A student’s knowledge structure needs to become more complex, so change is required. However, for students to get, and remain, sufficiently interested in such change, something has to regularly overcome their resistance, and this is where fumbling comes in. Fumbling Toward New Patterns To continue learning, students have to periodically open up to new ideas, which happens when their existing ideas are disrupted. Disruption makes students realize that their old patterns do not work as well as expected, and this causes them to fumble (in the sense of struggling to use old ideas and trying to clumsily reach for anything that may help). Active learning helps here too, especially if students are required to respond to provocative questions or make predictions that make them realize they need better ideas. This creates the kind of doubt that Feynman believed was the essence of learning (Gleick, 2011). I really understood the value of getting people to fumble when I encountered Stephen Thaler’s (1997) ‘imagination engines’ – computer programs that are creative. Thaler’s computer program is a type of neural network, so called because it mimics the workings of a human brain. A neural network can learn new skills because, like the human brain, it has brain cells (neurons) and constantly adjusts the connections (or ‘weights’) between these neurons. A neural network resembles a simple expert knowledge structure with many nodes and connections between them. As students learn more, their initial hub-and-spoke structure gradually comes closer to a simple expert structure (see Figure 2.5). But, as the structure becomes gradually more complex, new ideas and connections are needed for further learning; and this involves the personal creation of new knowledge. Unless there is some disruption, this will not occur, as Stephen Thaler demonstrated. Twenty years ago, Thaler wondered what would happen if he disrupted one of his neural networks. First, he taught it several well-known Christmas carols. Then he started cutting some of its connections. To his surprise, at first, the network somehow reconstructed memories along other neural pathways to repeat the carols perfectly. But after cutting more connections, it sang something quite creative: “All men go to good earth in one eternal silent night” (Thaler, 2013, p.452). The disruptions caused it to create new carols it never sang before. However, there came a point where the disruptions were too severe and it no longer made sense. Thaler realized that he had developed a truly creative machine, and with further refinements, his neural networks generated several inventions that

123

MAKE BORING AND COMPLEX IDEAS INTERESTING

were sufficiently original to be patented. What he realized was that there needed to be an optimal level of disruption. If he disrupted the neural network too little, it would simply repeat the old sedimented patterns (the Christmas carols in this case) because it did not really have to ‘think’ (find new neural pathways). But if he disrupted it too much, the neural network sang complete nonsense and the result was random utterances. There was a small range of disruption that led to new thinking by the neural network, and the same is true of humans. If an idea disrupts us too little, we think it is obvious and hardly pay attention. Disrupt us too much and we think the idea is crazy and we resist it. However, with just enough disruption, we become interested in changing. We start to think, and even if we don’t agree with the idea, our minds have been permanently stretched. Davis (1971) explained that the easiest way to disrupt an audience is to identify something they take for granted, and then to show that it is, in fact, the opposite. For example, in neuroscience, we used to think that emotion made us less rational; but, due to the work of Antonio Damasio, we now know that without emotion, we become irrational. Or we used to think that time was constant; but, because of Einstein, we now know that it changes depending on our movement through space. Davis’s strategy is a little too blunt for actual teaching situations. In lectures, I see disruptions as being subtler and occurring in different degrees of ‘spiciness’ – like Indian food, where: ■ ■ ■

Mild disruptions show students that there is something important that they never considered or thought about; Medium disruptions show students that what they thought up to this point is incomplete or wrong; Hot disruptions show students that things are, in fact, the exact opposite of what they previously thought.

It does not matter if, in the end, my audience agrees with me or not, or even if I agree with my own disruption. What matters is that my disruption is so well-reasoned that the audience is forced to think and argue about it. At that point, they are fumbling and willing to consider the new ideas I want to them to engage with. There are usually three situations in which disruption becomes necessary. ■

Complexification: Making a knowledge structure more complex involves some degree of disruption. This is because it involves broadening ideas, breaking them up, adding new connections or breaking connections. For example, suppose I teach the history

124

MAKE BORING AND COMPLEX IDEAS INTERESTING





of World War 2. I will, at some point, have to complicate matters by explaining a new cause – perhaps that the Allied powers were not completely innocent and contributed to the events leading up the war. Or, when teaching about government, I might, at first, use the metaphor of a family, but later, I have to show that this metaphor can be misleading. Change of context: When applying an idea to a different context, there is usually some disruption if some ideas need to be adapted. For example, when teaching financial literacy, I would explain that you should not be dependent on debt and not borrow money to pay expenses if you have emergency savings available. But when I explain the kind of financial literacy needed to survive under extreme poverty, I would explain that in this context it does not make sense to be debt-free and that you should, in fact, borrow money rather than draw on your emergency savings. This leads to a broader idea about what it means to be financially literate and is definitely disruptive. Common beliefs or preconceptions: When teaching new ideas, one often runs up against preconceived ideas that block further learning and have to be disrupted so that learning can progress. For example, if you teach Biology, you need to disrupt the widely held belief that everything needs a designer; or in Engineering, you need to disrupt the idea that models need to be realistic; or in Economics, I cannot teach international trade unless I disrupt the idea that imports are bad for a country.

My own experience suggests that whether you use mild, medium or hot disruptions depends on the situation, as Table 5.4 shows. Most gradual complexifications of the knowledge structure do not change the structure dramatically, so the disruption is usually mild or medium. A change of context has the potential for more severe disruptions; while overcoming preconceptions very often requires the hot version of disruption. Table 5.4 Choosing the Level of Disruption

Complexification Change of context Common sense

Mild

Medium

Hot

X

X X X

X

125

MAKE BORING AND COMPLEX IDEAS INTERESTING

One has to take care when disrupting students’ knowledge structures. Disruption is useless if done for its own sake. As always, you should focus on only a few ideas: those that are the next step in continuing the learning process. Too much disruption will not allow students to consolidate changes in their understanding or think deeply about them. Once a person has been disrupted and they are fumbling, they are open to being fascinated by a new idea, and the cycle starts again. Let’s put it all together now.

PUTTING IT ALL TOGETHER There is no recipe for using the F3 approach – it depends on the audience and the presenter’s own style. Though ideal, it might not always be possible to use all three Fs, and their actual use may only be a small part of a presentation. The best I can offer is possible steps, as laid out in the exercises at the end of this chapter, and some guidelines derived from my own experience. I sometimes teach Economics to MBA students. This is challenging because it is often a mixed audience, from people who know virtually nothing to people who keep up with the daily economic and financial news. I find fumbling most useful here in two ways. Firstly, those who know a great deal come to my classes ready to be bored, and they can quickly become disruptive unless I disrupt them first. Because they keep up with the news, they have many more unquestioned assumptions about Economics than those who know little. As a result, I have no problem disrupting them and keeping them engaged. Secondly, this helps me to deal with the mixed audience: I can now explain the basic ideas to those who know little, while the knowledgeable listeners are kept attentive with the disruptions I regularly direct at them. When I then introduce the ‘fun’ component in the form of an authentic task, I find that both groups enjoy it and learn from interacting with each other. I employ fascination as early as possible when teaching skills and ideas that appear useless and boring on the surface – especially when I teach research writing skills to (post)graduate students. In the first lecture, I usually pull the alarm, mystique, power, vice, and prestige triggers. When pulling the alarm trigger, I usually throw in a bit of disruption in as well. I explain to them, with examples, how almost everything they have learned in their undergraduate studies has set them up to fail when researching and writing their thesis. Obviously, I then present my course as the solution. With the mystique trigger, I appeal to their innate desire to find out more about the things they care about; and a touch of vice may enter as I show them that, in research, you finally have some freedom to question authority. It is not difficult to pull the power and prestige triggers as I talk about how

126

MAKE BORING AND COMPLEX IDEAS INTERESTING

it feels to see your ideas have an impact and how it advances your reputation and career. By the time I am done, hopefully, none of them leave thinking research writing is boring. I follow this up throughout the semester with a series of cumulative authentic research activities that leave them with a sense of achievement at the end. The aspect of fun is indispensable – unless there is an active engagement with the ideas, the work of fumbling or fascination is wasted. So, fun is usually my follow-up after I have addressed fumbling or fascination. Fun is sometimes a way to make an apparently threatening topic (perhaps due to perceived difficulty) less so. When teaching Finance, I may sometimes start with a simple investment game that shows my students that they already know some useful ideas and so build up their confidence. During the course of the semester, the game becomes more challenging and I use their experiences in the game to set them up for disruptions. In conclusion, let me take you on a quick tour of an actual lecture I give on the business cycle to business executives, so you can see how I apply the F3 approach in totality. I walk into the room with many skeptical listeners who think I have nothing to teach them, a few who are nervous because they know very little, and some in-between. I have a range of introductions designed to soothe the fears of the less knowledgeable and disrupt the rest enough for them to pay attention. One of my favorites is to ask them why they think a useless subject like Economics is even taught at this level. I would explain that it is useless because just about everything in the subject is obvious if you get beyond the jargon – and if it is not obvious it is usually of no practical consequence to a business. It does not matter where this discussion takes us except to conclude by saying that they should have an answer by the end of the session. Since my topic is the business cycle, I explain to them that there are few concepts in economics that are so useful in timing the exploitation of business opportunities. I spend a minute or two pulling the power and mystique triggers and may even throw in some true and alarming stories of firms going bankrupt by ignoring the business cycle. I then launch into a brief explanation of the business cycle for those who do not have sufficient prior knowledge so they can benefit from the lecture. I keep it short and simple, focusing on a few critical ideas. At this point, I trade accuracy in the details for broad understanding, especially since the details will only become meaningful later. Now they are ready to be disrupted, and just in time… before the knowledgeable ones become bored. I ask them: “Which part of the business cycle is the best for a business?” Most will answer the upswing phase when the economy is expanding. I then tell them this is mistaken and the reason some firms went bankrupt, and that the best part in the cycle for a

127

MAKE BORING AND COMPLEX IDEAS INTERESTING

successful business is, in fact, the downswing, when the economy is stagnating. At this point, I no longer need to push my explanations onto them; they will pull everything out of me through their arguments and questions. This is the longest part of the lecture, but sometimes it feels like the shortest. Finally, I end the lecture with an authentic task where, given the expected direction of the business cycle, they criticize or commend their own company’s existing strategies and devise new or adapted strategies. We may even look at the cases of bankrupted firms I mentioned and see if they could have done things differently. At this point, there is no need to return to the question of why a useless subject like Economics is taught. They can see that, even though the ideas I taught were obvious in hindsight, the application of these ideas is not always obvious when we are blinded by unquestioned assumptions.

CONCLUSION Fun, fumbling and fascination are not entirely separate concepts, and they overlap quite a bit. By extension, interestingness is not so much a product as a continuous process, as shown in Figure 5.2. The process is summarized in Table 5.5. To end, I return to the ‘great explainer’. It is easy to regard teaching as a boring but necessary activity that takes time away from research. But Feynman showed that teaching is sometimes the most interesting thing a professor can do. For him, teaching was a way to test and improve his own understanding. More importantly, teaching disrupts us. He believed that because “teaching is an interruption” (Feynman & Leighton, 1984, p.5) it presents us with opportunities to question the assumptions of our discipline, gain new perspectives and find new questions for research.

Table 5.5 Summary of the F3 Process Purpose Fascination Fun Fumbling

128

Attracts us to some ideas… makes us want to take up the challenge… …but once we master it, it becomes boring, so something has to create the need for new ideas…

Action …by pulling one of our fascination triggers… …of finding and mastering new ideas… …by disrupting our old ideas…

MAKE BORING AND COMPLEX IDEAS INTERESTING

EXERCISES 1. Identify a topic that students find boring. Follow this three-step process to create fascination: a) Consider each of the seven triggers of fascination and write down as many ideas as you can about how you can use them to get students’ attention. b) Given what you know about your students, which trigger/s are they most likely to find relevant? Eliminate the others for now. c) Which of the triggers are easiest to connect to the critical idea/s of the topic? Find at least one trigger for the topic, and, if appropriate, one for every critical idea 2. Identify an idea that students find boring. Follow this process to create opportunities for fun: a) Work out how you will get students fascinated with the idea, or at least the topic of which it is part. b) How will you reveal the pattern of connections that create the idea? Consider active learning methods (see Box 5.1). Or will it simply be an explanation accompanied by a series of thoughtprovoking rhetorical questions with pauses? The point is not to talk non-stop, but to give students the chance to participate, even if they do so just mentally, in trying to figure out the pattern. c) Design an authentic task based on the idea or overall topic, that can be done individually or in groups. 3. Identify a topic that students find boring. Follow this three-step process to create fun: a) Identify the points at which the knowledge structure needs to change due to complexification, preconceptions or changing context. b) Choose one of these for which to introduce a disruption. It is best not to disrupt too much, so eliminate potential disruptions that are not related to critical ideas. Students should fumble only to open their minds to important changes in their knowledge structure. c) What degree of disruption will you use (mild, medium or hot)? Work out how you will do it.

REFERENCES Ambrose, S.A, Bridges, M.W. & DiPietro, M. 2010. How Learning Works: Seven Research-Based Principles for Smart Teaching. San Francisco: Jossey Bass.

129

MAKE BORING AND COMPLEX IDEAS INTERESTING Bain, K. 2004. What the Best College Teachers Do. Cambridge: Harvard University Press. Caine, R.N. & Caine, G. 1991. Making Connections: Teaching and the Human Brain. Alexandria: Association for Supervision and Curriculum. Costandi, M. 2013. Evidence-based justice: Corrupted memory. Nature, 500:268–270. Csikszentmihalyi, M. 1990. Flow: The Psychology of Optimal Experience. New York: HarperCollins. Davis, M.S. 1971. That’s interesting! Philosophy of the Social Sciences, 1(4):309–344. diSessa, A.A. 2014. A history of conceptual change research: Threads and fault lines, 2nd edition. In: R. K. Sawyer (ed.), The Cambridge Handbook of the Learning Sciences, 88–108. New York: Cambridge University Press. Feynman, R.P. & Leighton, R. 1984. The dignified professor. Engineering & Science, November:4–10. Gleick, J. 2011. Genius: The Life and Science of Richard Feynman. New York: Open Road. Hogshead, S. 2010. Fascinate. New York: HarperCollins. Koster, R. 2005. A theory of fun for game design. Scottsdale: Paraglyph Press. Loftus, E.F. & Palmer, J.C. 1974. Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning and Verbal Behavior, 13(5):585–589. Thaler, S.L. 1997. A quantitative model of seminal cognition: The creativity machine paradigm (US Patent 5,659,666). Paper available at www.imagin ation-engines.com, accessed 24 April, 2007. Thaler, S. 2013. Creativity Machine® Paradigm. In: Carayannis, E.G. (ed.), Encyclopedia of Creativity, Invention, Innovation, and Entrepreneurship. New York: Springer-Verlag, pp. 447–456. Tyng, C.M., Amin, H.U., Saad, M.N.M. & Malik, A.S. 2017. The influences of emotion on learning and Memory. Frontiers in Psychology, 8:1454. Wiggins. G. and McTighe, J. 2011. The Understanding by Design Guide to Creating High-Quality Units. Alexandria: Association for Supervision and Curriculum. Willingham, D. 2009. Why Don’t Students Like School? San Francisco: Jossey-Bass.

130

Chapter Six

If You Want Students to Reason Like Experts, Don’t Teach Them How to Reason What does it mean when you give a group of professors and some bright high school students the same quiz and some of the students outperform the professors? Does it mean that the professors are not really experts and have little to teach the students? This is exactly what Sam Wineburg (1991) did when he gave some History professors and students a test on the American Revolution. Some of the professors did not specialize in this topic, and a few students did better than them on the quiz. But for Wineburg, this was neither surprising nor embarrassing, because he did not accept the conventional definition of ‘expert’. What is an expert really? In a subject like History, is it a person who can answer the most questions about history, or is it a person who knows how to be a historian? Being a historian, sociologist, geographer or botanist is not a quiz contest. Instead, experts are people who know how to make sense of conflicting, incomplete, and vague ideas and use them to improve not only their own knowledge but also create knowledge in the discipline. They know how to be an expert, as mentioned in Chapter 1. This is why, after the quiz, Wineburg gave them the real test. Students and professors were given the task of drawing conclusions from real historical documents, which contained, as one would expect, many gaps, inconsistencies, and conflicting views. For example, one of the tasks was, after having been given a variety of source documents, to select from a group of paintings, the one that best reflected the actual Battle of Lexington. Here is where the professors excelled. They thought that the task was very difficult and it pushed them to their limits. Wineburg observed how they: tried to interpret the evidence in different ways; endeavored to make sense of the conflicts and the gaps; went back-and-forth; generated

131

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

alternative explanations, with provisions and qualifications; and finally identified the illustration they thought was the least wrong, and hence most trustworthy. They did this by going back to the first principles of studying history in order to distinguish between the critical and non-critical information and reasoning from that toward a tentative answer that approximated the likely truth. In other words, they really struggled. So, how did this show that the professors were experts? Consider what the students did. For them, the task was much easier, and they treated it like a multiple-choice quiz. They thought that there had to be one right answer and that their job was simply to select the one that was pre-determined as correct. There was no attempt to construct knowledge or arrive at a sophisticated understanding, because they thought of knowledge as given and constant, not as something that is imperfect, changeable, contested and constructed. If knowledge were something that is given and constant, the difference between those who understand and those who do not, would not be so clear. It only becomes obvious when experts and novices are thrown into situations with no well-defined questions or answers, where knowledge has to be reasoned out. Referring to one of the professors in the study, Wineburg (1991, p.84) put it like this: “Her expertise lay not in what she knew, but in what she was able to do when she did not know.” This confirms everything from the previous chapters. To remind you: understanding is the ability to compress information into a much smaller number of connected ideas; and then, by using reasoning and a bit of imagination, generate new information, even in situations not experienced before. In the study, the professors had compressed their knowledge into a few deep principles. Through reasoning, they unpacked those principles and inferred new connections, helping them make sense of a challenge they had never faced before. In Chapter 1, we looked at the concept of compression of a subject into a few critical ideas; then, in Chapters 2 to 5, we investigated how connections generate understanding, improve memory, and evoke interest in those ideas. This chapter completes the set of five critical skills a professor should have: enabling students to expand their understanding through reasoning. This ability is also the biggest differentiator between novices and experts.

REASONING UNLOCKS THE POWER OF UNDERSTANDING Richard Feynman (2011, p.4) explained: “There is an enormous amount of information about the world if just a little imagination and thinking

132

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

are applied” to a few critical ideas. What he called ‘imagination and thinking’ I will simply refer to as ‘reasoning’. Reasoning causes the seeds of a few critical ideas to bloom into a forest of smaller, connected ideas – it unlocks the full power of understanding. Reasoning is commonly defined as the process of drawing new conclusions from a pre-existing set of information. It usually involves deductive logic, but in the cases of inductive and abductive reasoning, it may also require some imagination. Mercier et al. (2016) point out that this is more accurately described as ‘inference’. They explain that reasoning is, in fact, a more specialized form of inference: it is inference guided by the process of finding, using, and evaluating reasons. It is not simply making claims (like “It is going to rain later.”), but making arguments: making a claim, backed by evidence, warrants, counterarguments, and qualifications. Conversations make reasoning particularly effective, because, when we make arguments, we reveal our reasoning, so that our reasons for claiming something can then be questioned and improved by others. As we are exposed to new arguments in this way, we start seeing weaknesses in our own thinking, and our understanding deepens. Mercier et al. (2016) call this public exchange of reasons ‘argumentation’. To be accepted by others, all ideas have to be argued. Without having any reasons, it would be impossible to respond to ideas, and it would also be difficult to know how to connect them to other ideas and build on them. As the connections are uncovered, it becomes possible to elaborate on the ideas. The imaginary conversation technique, encountered in Chapter 1, and which is itself a form of reasoning, showed how this happens. Reasoning, but specifically argumentation, also overcomes a big problem that occurs when people feel they understand something completely – the tendency to absolutism. When compressing information to a few critical ideas, it is easy to get a sense that one now possesses the ultimate set of ideas that can explain anything, anywhere, at any time. Because reasoning exposes our thinking to others, it creates the opportunity for others to question us and for doubt to enter our minds. This keeps us open to the possibility that our understanding is imperfect and that there is still more knowledge to construct. Like the History professors, when we reason, we are made to realize that much of what we know is wrong, or at the very least, conditional, and that it often depends on context or perspective. It is this realization that makes us understand as experts do. Experts do not simply see the connections, they also recognize the conditions under which the connections may or may not be valid, and this provokes them to always search for ways to improve their understanding. Finally, because argumentation forces us to explain ourselves publicly, it taps into the ‘self-explanation effect’ (see Box 2.3). It has now been

133

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

well established that, by making our thinking explicit and articulating ideas and reasons, we become aware of imperfections in our own understanding, and are thus encouraged to improve it and fill in the gaps. Clearly then, active promotion of student reasoning is an important part of great professors’ courses (Bain, 2004). The temptation is then to teach it. But that would be a mistake.

DO NOT TEACH STUDENTS HOW TO REASON Human reasoning is an ability that evolved over millennia to solve the problem of creating and evaluating knowledge in groups. While not perfect, it has left almost every human with excellent reasoning skills, as long as these skills are exercised under the right conditions, specifically in settings where there is a public exchange of diverse ideas – that is, in argumentative settings. Teaching reasoning skills through things like logical fallacies for example, not only distorts natural reasoning skills, it also wastes valuable time because one would be teaching something that is an innate ability. It is better to simply create the conditions that get students to actually use the natural abilities they already possess and guide them in applying these skills to the discipline. A detailed and well-researched defense of this view can be found in Mercier et al. (2016) and Mercier and Sperber’s (2017) insightful book. For the purpose of this section, I review only some of the arguments with educational implications. Kahneman (2011) and other researchers, especially in economics and psychology, create the impression that human reasoning is flawed and biased. They seem to ignore the vast body of literature that finds that people are actually naturally competent reasoners. Humans can easily spot fallacies and biases in the arguments of other people, are quite good at evaluating arguments that they care about, and become critical thinkers when confronting arguments with which they disagree. The reason for this is that our ability to evaluate arguments mainly depends on whether we are exposed to counterarguments and also how easy it is for us to produce or anticipate such counterarguments. Indeed, as individuals, we are lazy reasoners and prone to reach biased conclusions. However, in a conversation, the arguments I generate for my case are the counterarguments against your case. So, when we get together, your (counter)arguments cause me to change the initial evaluation of my own arguments or cause me to produce better arguments, which, in turn, causes you to re-evaluate your arguments. If we are bad at reasoning, it is not because of biases or a lack of knowledge, but because we have not been able to think of sufficient

134

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

counterarguments. This is something easily fixed in groups containing people who have different perspectives. In fact, Mercier et al. (2016) highlight that the most persuasive evidence that humans are good at reasoning comes from studies of group discussion. The research agrees that a group can out-reason individuals, but only if it contains people with different views who are willing to voice their opinions and criticize the opinions of others. Due to its ability to generate more counterarguments, group reasoning has been found to consistently outperform individual reasoning. Groups containing a variety of views are more likely to produce counterarguments, and, furthermore, in cases of inquiry-based reasoning, groups are up to five times better at reasoning than individuals. Very similar findings exist with regard to the use of evidence. People fail to use evidence only when they find it difficult to think of any, which is often the case in informal discussions. However, when evidence is available, people are likely to use it and use it intelligently (Brem & Rips, 2000). Studies on collaborative learning agree. Students’ discussions are effective when they are exposed to alternative views and feel free to state their views and change their minds. Such discussions improve student performance and deepen conceptual understanding (Johnson & Johnson, 2009; Nussbaum, 2008). This assumes that groups are small and that certain rules are in place, such as avoiding personal attacks, staying on the topic and fair representation of arguments. It is here where templates – such as those of Graff and Birkenstein (2006), the sentence openers of AcademicTalk (McAlister, Ravenscroft & Scanlon, 2004) and the prompt and response frames of Zwiers & Crawford (2011) are useful. When reasoning skills are taught as something that is separate from the discipline, students rarely transfer those skills to other ideas. However, when reasoning is integrated into the teaching of one topic, students not only apply it to that topic but also transfer it to new topics as they learn to anticipate counterarguments and improve their own arguments (Kuhn and Crowell, 2011). The more students talk to each other in argumentative settings, the more they learn (Resnick, Asterhan & Clarke, 2013, Henderson et al., 2015).

UNDER THE RIGHT CONDITIONS, EVERYONE REASONS WELL From the above, we can derive the conditions under which reasoning is naturally effective. If we can create these conditions, not only will students develop and extend their natural reasoning skills, but they will also remember and understand more.

135

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

Most importantly, reasoning is best in groups consisting of individuals who exhibit different perspectives. In contrast to the directives of brainstorming proponents, people in such groups should feel free to criticize each other, because the quality of our reasoning is significantly enhanced as we are exposed to more counterarguments. After a while, the realization that our arguments are easily refuted causes us to adopt a more critical attitude toward our own reasoning and anticipate counterarguments on our own. This interplay of argument and counterargument produces optimal results in small groups of four to five people. If the group is too small, the variety of views may be too limited, and if it is too large, not everyone gets an opportunity to share their arguments. Argumentation works best when there is an exploration of differences rather than an attempt to eliminate those differences. Individuals in a group should care less about being right and more about making sense of ideas. To achieve this, the reasoning process should ideally start with an idea about which, even among experts, there is a knowledge gap or conflict. If students know that even the experts are unsure, they are less likely to be dogmatic and more motivated to pursue new understanding. Finally, reasoning is better practiced in the context of knowledge evaluation and production within the discipline, as opposed to teaching it as a separate topic or course. This applies even in cases where the disciplinary aspects of reasoning need to be taught.

IF YOU HAVE TO TEACH, THEN TEACH MOVES AND VARIATIONS Reasoning is a skill that is generally expected in all disciplines, and fairly generic. In fact, when Arnold Aarons shares his list of reasoning abilities that comprise critical thinking, many professors think that because the list is so accurate, he must know their discipline (see Bain, 2004, pp. 85–87). But these generic skills usually need not be taught. What should be taught are the “disciplinary ways of thinking and knowing” (Middendorf & Pace, 2004, p.1) and the discipline-specific variations of reasoning. Variations are found in the way in which different disciplines search for evidence and in the different sources used. Another variation lies in drawing inferences from data, because, not only is the quality of data judged differently, different systematic methods of data analysis are used in different disciplines. These, and other disciplinary differences, are not innate to human beings, and therefore have to be taught. There is one general aspect of reasoning that needs explicit instruction, and that is the language used in talking about it. This includes the

136

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

vocabulary used in describing different elements and structure of arguments and the different argumentative moves. The Toulmin model is an excellent way to acquire this vocabulary, while argumentative moves can be learned from various sources, such as the templates mentioned earlier, or from theoretical categorizations, such as appraisal theory (Martin & White, 2005) or dialog theory (Walton, 2000). BOX 6.1 ELEMENTS OF ARGUMENT AND ARGUMENTATIVE MOVES The most commonly used model to learn about the elements and structure of individual arguments is that of Stephen Toulmin. To find information on that, search for “Toulmin model”. There are some good summaries of Graff and Birkenstein’s (2006) book of argumentative moves, which you can find by searching for: summary “moves that matter”.

HERE IS ONE WAY TO PROMOTE LEARNING THROUGH ARGUMENTATION While there are countless books and courses that aim at learning-toreason, there are far fewer reasoning-to-learn approaches. Some of the best approaches I have come across are discussed by Andriessen and Baker (2014) and Scardamalia and Bereiter (2014). To this valuable literature, I will add one more approach that is suited to controversial topics without obvious answers. It draws quite heavily on the techniques described in Wentzel (2017) and Chapters 3 and 4 of Wentzel (2018). Step 1: Select a Topic Select a topic within the syllabus that contains ideas that may lead to very different, even conflicting, conclusions. This will not always be communicated in the textbook, so you often have to draw on your understanding of the controversies in the discipline. For example, when teaching strategic management, one issue is which approach to developing strategy is superior. While most textbooks support a prescriptive approach, where strategy is developed and defined by a small group at the top and then communicated downwards for implementation, in practice, one finds that strategy is not pre-determined, but rather emerges as the organization continuously adapts to changes.

137

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

Step 2: Generate Uncertainty in Students’ Minds Avoid adversarial reasoning: where students become invested in a position and defend it at all costs, so stay clear of religious and politically sensitive topics. Instead, encourage inquiry-based reasoning: where students become less interested in being right and more interested in making sense of the information that generates controversy. Make a convincing case for the different views (ideally limited to two) without hints as to which one you agree with. Make it as clear as possible why these views are in conflict or inconsistent with each other, why it is important to take a position and why it is difficult to agree with both. If there are gaps in experts’ knowledge, strong disagreements among them or puzzles they have not yet solved, these will help to generate further uncertainty. If the textbook takes one view, you need to draw on history, practice or alternative views to show that there is no clearly correct view and that even the experts are unsure or in disagreement. You don’t want students to approach reasoning as a multiple-choice test or as a competition, but rather to see other arguments, even opposing ones, as useful in helping them to improve their understanding. Uncertainty is also related to some of the triggers of interest from Chapter 4 and gets students to care about the arguments. This helps because humans become more competent reasoners when dealing with topics they care about. Step 3: Derive an Incomplete Summary Based on your explanations of the views, summarize the result (as shown in Figure 6.1). This summary is only a start and gets completed in step 5. To create the diagram, write the opposing views in the boxes V1 and V2. Then ask yourself: If we take this view or put it in practice, how will it help those involved? For the system in question, what need does it satisfy? The need satisfied by V1 is written in N1, and the need satisfied by V2 goes in N2. For example, those who agree that strategy must be pre-determined at the top, believe this is necessary in order to execute and evaluate strategy as fast and as consistently as possible. The diagram is most likely correct if: (1) you can read ‘if-then’ statements from both branches in a way that makes sense; (2) it is very difficult to agree with both views (V1 and V2); and (3) both sides are consistent with credible literature or practice. Figure 6.1 seems to work because an ‘if-then’ statement that makes sense can be derived from both branches. The first statement is: “If we

138

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

Fast implementation and consistent evaluation (N1)

B

Pre-determine strategy at the top (V1)

A

Organization adapts to changes in the environment (N2)

C

Allow strategy to emerge over time (V2)

Figure 6.1 An Incomplete Summary (Completed in Figure 6.3)

pre-determine strategy at the top, then we will have fast implementation and consistent evaluation (of strategy)”; and the second: “If we allow strategy to emerge over time, then the organization will adapt to changes in the environment.” Based on the literature, the two views are definitely in opposition. Once you have generated this summary from your explanation, keep it up where it is visible. Steps 4–7 build on this using a technique called ‘think-pair-squareshare’ (Bain, 2004, p.130). It works because, as mentioned earlier, humans seem to reason best in small groups (consisting of 4–5 people). In such groups, the counterarguments of everyone are heard, and this is important because it is the availability of counterarguments that makes reasoning effective Step 4: Individual Thinking with Evidence Ask individuals to take a provisional position on the different views and to find reasons for their position. If they agree with V1, they will find the reasons in arrow B of the incomplete summary, but if they agree with V2, they will find these in arrow C. To find the reasons, they can simply ask themselves questions such as: Why do I agree with this ‘if-then’ statement? What needs to be true for this statement to be true? What is the statement assuming? For example, the reasons underlying V1→N1 may be that top managers have better knowledge of the company’s challenges and should,

139

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

therefore, decide on strategy, or that the participation of other employees in the strategy-making process would slow it down. Reasons underlying V2→N2 may include: that strategy does not have to be a formal document; that strategy should be influenced by those who have to implement it; or that the pace of change is accelerating, so strategy should remain open to change. These reasons should then be transformed into arguments, using something like the Toulmin model (see Box 6.1). This means that reasons have to be supported by evidence and explanations of how the evidence supports the reason (warrants) and qualifications. If there is time, also ask students to think of possible counterarguments to their position, and how they would rebut such arguments. People will reason with evidence if the evidence is available. So, for this step, prepare a sheet containing relevant evidence in advance, and hand it out to students before continuing. The evidence should not be skewed in favor of one view. By anticipating as many of the prospective reasons as possible, you can gather, in advance, the evidence that confirms or contradicts each of these. The result should be a fact sheet that contains information that both confirms and questions the validity of the reasons on both sides of the diagram. Alternatively, you can allow students to search for evidence during the lecture. Step 5: Pair for Maximal Difference Have students position themselves physically on a line such as the one in Figure 6.2. Place them in pairs in such a way that an equal difference (more or less) exists within every pair. The dotted lines in the figure indicate how this would be done: match students F and L, students G and M, and so on. Matching students from the outside in (for example, F-S, G-P, H-N) would not work because some pairs would have very large differences, while others

Fully agree with V1 F G

Fully agree with V2 H

J

Figure 6.2 Students Take a Position

140

K

L

M

N P

S

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

would have small differences; in such a case, not all pairs will exhibit the same quality of reasoning. Reasoning naturally works better when there are differences that are large enough to expose people to counterarguments they would not have thought of by themselves. Of course, it is ideal if one could get fewer students to initially take a middle position, but this is not always possible, and it is best not to artificially force students to be extreme. The first thing that pairs should do is get into the right mindset to pursue inquiry-based reasoning (as opposed to adversarial). Ask students to find common ground by finding their common goal and extending Figure 6.1 into Figure 6.3. Figure 6.3 shows that this is done by recognizing that both sides, even though they disagree, are trying to achieve the same goal. The common goal is usually quite broad: the broader and more general, the easier it is for both sides to agree with it. In the figure, for example, the common goal may be something like: “An organization that is profitable in the long-run”. Test the common goal by checking if it generates if-then statements that make sense. In this case, both make sense. N1→G makes sense: “If there is fast implementation and consistent evaluation of strategy, then the organization will be profitable in the long-run”; and so does N2→G: “If the organization adapts to changes in the environment, then it will be profitable in the long-run”. Recognizing that both sides have a common goal will go a long way to evoke a more collaborative attitude between both sides, as captured by the “principle of charity” (Wilson, 1959, p.532). The principle of charity encourages us not to assume that those who disagree with us are

D

Fast implementation and consistent evaluation (N1)

B

A

Common goal (G)

E

Pre-determine strategy at the top (V1)

Organization adapts to changes in the environment (N2)

C

Allow strategy to emerge over time (V2)

Figure 6.3 Find Common Ground

141

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

incompetent or ignorant, but rather to assume that they have good reasons for their view. Instead of nit-picking on tiny inconsistencies in an opponent’s argument, we should try to understand its overall intent and construct the best possible version of such an argument with which to argue. BOX 6.2 PRINCIPLE OF CHARITY The Wikipedia entry for the ‘principle of charity’ is not so easy to understand, but you can find many clearer explanations when your search for “principle of charity” on the web. Several of them explain how this principle enables us to understand more, how it makes our arguments more likely to produce valuable knowledge, and how it stretches and improves our ability to formulate stronger arguments. Give students some time to individually find a couple of reasons for why arrows D and E could be valid. A possible reason underlying D include may be that strategy has to be implemented fast, so the organization can see if it is doing the right things; and a reason for why arrow E could be valid may be that when the environment changes fast, organizations can only survive if they adapt quickly. While doing this, students will find that some of the reasons underlying D and E are sources of disagreement. Ask them to take the reasons for the side they are closest to, and transform them into arguments, using the fact sheet or other sources of evidence. Once they have done this, ask them to exchange these arguments, in addition to the arguments for their side from step 4. So, the student closest to V1 should share her arguments based on the reasons underlying arrows B and D; and the one closest to V2 should share his arguments underlying arrows C and E. Before they engage in argumentation, have the students simply ask each other the following questions on the arguments they wish to challenge: ■ ■ ■

Is this reason true for all time? Is this reason true under all conditions? Is there really evidence for this reason? If so, is the evidence credible?

Then, let them respond to each other with additional counterarguments using the language, pre-defined rules, and templates of argumentation. If the two sides are inconsistent or in conflict, the counterarguments

142

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

should be easy to find. The reasons underlying arrow B usually evoke counterarguments to those underlying arrow C, and vice versa, and the same applies to arrows D and E. Next, ask each side to refine their position based on the counterarguments of the other side, especially those counterarguments they did not consider in step 4. This may involve shifting their position, filling gaps in their reasoning, making concessions to the other side and, most importantly, more clearly specifying the context or conditions under which their argument would be true. The diagram that results from steps 2–5 is based on the work of Goldratt (1994). It has many other uses, including finding original research contributions (Wentzel, 2018); designing productive questions; and systematically identifying assumptions (Wentzel, 2017). Step 6: Square for Greater Understanding Now ask any two pairs to come together. If the class number is not divisible by four, try to keep the groups between four and six students. In the case of Figure 6.2, where there are 10 students, I would probably ask students F, G, N, P and S to join together in one group; and H, J, K, L and M in another. In this case, the size of the differences between groups is less important because students will have already opened their minds to a wider variety of arguments as a result of step 5. The purpose of step 6 is to use the arguments to prepare students for the creative synthesis in step 7. Let students briefly exchange their positions and main arguments so that everyone is familiar with each other’s views. Now ask them to identify what both sides agree on. For example, if the argument on one side is that: “Strategy has to be implemented fast so the organization can see if it is doing the right things”; while the other side argues that “When the environment changes fast, organizations can only survive if they adapt quickly”; then they should be able to see that both sides acknowledge that strategy deals with change – even if they disagree on the kind of change. If there is time, ask students to define what information would be necessary to decide between the most critical opposing arguments. Based on this, let them design potential research projects or experiments that would be able to generate this information. Step 7: Share to Prepare for Synthesis Bring the whole class together. Allow a little bit of time for people to share the most interesting arguments they encountered. But this can go even further.

143

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

Explain to them that, even when we disagree with a view, if that view exists in practice or in the minds of other experts, it contains valuable information. By synthesizing opposing views, we can take advantage of the information on both sides, instead of trying to destroy the opposing side and the information it holds. Many creative breakthroughs are often the result of new ideas that emerge from synthesis. Here, you can mention some examples from the discipline, for example, some best-selling books on strategy, like Ries (2011), have found ways to synthesize the opposing sides found in Figure 6.3. Figure 6.4 shows the three questions that guide the class discussion towards synthesizing the two sides and generating new knowledge from their understanding at this point: ■





How would it be possible to have fast implementation and consistent evaluation of strategy by allowing strategy to emerge over time? (N1→V2) How would it be possible for the organization to adapt to changes in the environment by pre-determining strategy at the top? (N2→V1) How would it be possible to achieve both: allow strategy to emerge over time and also pre-determine strategy at the top? (V1↔V2)

Produce a record of the ideas that emerge. Guide the class to develop arguments for the most promising new ideas, and to establish where

D

Fast implementation and consistent evaluation (N1)

B

Pre-determine strategy at the top (V1)

Common goal (G)

E

Organization adapts to changes in the environment (N2)

Figure 6.4 Creative Synthesis

144

C

Allow strategy to emerge over time (V2)

DEVELOP EXPERTISE WITHOUT TEACHING REASONING

these ideas fit within the conversation between the experts in the discipline. Gaipa (2004) suggests some ways to think about this. Also, when facilitating a class discussion, it is useful to mainly use Socratic questions such as those discussed by Paul and Elder (2016). Step 8: Debriefing Using questions, debrief the class. Review how they may have modified their original views, and what they learned. Get a sense of how many people shifted their position – maybe get them to take a position on the line again. Ask them to identify what surprised them during this process, and to draw some overall conclusions. Help them to see how their thinking is starting to approach the kind that is found among experts. A Summary of the Procedure This procedure is summarized in Table 6.1. Table 6.1 A Procedure to Promote Learning through Reasoning Step

Description of action

Minutes

1 2

Announce a controversial topic Explain the views, generate uncertainty, show the difficulty of a middle position and why taking a position is important Generate an incomplete summary and keep it visible Hand out a fact sheet and let students review it Students take a provisional position Students engage in individual reasoning in order to formulate arguments underlying the arrow closest to their position (and possible counterarguments*) Students position themselves on a line Divide them into pairs that contain different views Pairs define their common ground Individually formulate arguments underlying arrows D and E Exchange arguments within each pair Students ask each other challenging questions and try to answer them Engage in further counterarguments Students individually refine their argument and change their positions Pairs combine into groups of 4–6 people Groups familiarize themselves with each other’s arguments (no debating) Groups identify points or principles that both sides agree on

-

O

(D

150

75

100

125

175

Interval Since Training Short Long 75

150

100

75

C I f the track coach discovers that there is an interactive effect, such as i n Box C, this m i g h t lead to a best-of-all alternative, say, no recent t r a i n i n g and much sex ( b u t please don't generalize, or believe, this hypothetical conclu­ sion ) . Similarly, i f the statistics-teaching experiment were to be done i n b o t h junior and senior years w i t h several teachers, i t m i g h t show that the M o n t e Carlo method works better for juniors and worse for seniors. This could be an i m p o r t a n t piece of knowledge that w o u l d not appear i f the experiments were done only w i t h students i n one year (a piece of information that w o u l d be lost and w o u l d only b l u r the overafl results, i f the experiment were

Designing Experiments

171

done on b o t h junior and senior students w i t h o u t p l a n n i n g and w i t h o u t checking the effect of the school-year v a r i a b l e ) . Searching for such interac­ tive effects is an important function of a multivariate experimental design.

7. E x p e r i m e n t a l Designs to Study D e l a y e d Effects I m p l i c i t i n the experimental designs previously described is that all the effects occur immediately. B u t sometimes the effects of the independent variables occur slowly and are spread out over time. Furthermore, one may want to k n o w about the effects of various amounts of the stimulus given over time. This often is the case i n medical research and i n education and advertising. For example. Random House may wish to k n o w the effects on the sales of the second edition of this book i f i t advertises i n the fall of 1977, or i f i t advertises i n fall, 1977 plus advertising i n spring, 1978, or i f i t does not advertise at all. The appropriate design experimentally treats the different samples differ­ ent numbers of times ( i n separate time periods) and then observes the results over several periods. As an example of just one of the many possible designs. Table 11.9 shows that group Si (a sample of areas) receives TABLE 11.9

Illustrative Delayed-Effects Design Fall, 1977 Periodic

Si

X (advertise)

Spring, 1978

Periodic

0 (measure

11^

11^

X

0

Fall, 1978 111^

III,

0

sales) Sg S3

X

O O

0 0



0 o

advertisements i n Fall, 1977, and Spring, 1978; S2 (another group of areas) gets advertisements i n Fall, 1977, and S 3 gets no advertisements. The sales i n a l l three groups of areas are measured not just i n the advertising period b u t afterward as well, to see h o w m u c h of the effect of the advertising continues into the future. This may be seen, for example, i n the sales i n Period I I I of groups S2 and S 3 , b o t h of w h o m receive no advertising i n Period I I , though S2 received advertising i n Period I . A n d b y comparison of S2 sales i n Periods I I and H I , one can determine h o w the effects of Period Ts advertising "decays" over time i n group S2. Comparison of results for Si and S2 w i l l show h o w much effect the period IPs advertising has in addition to the effect of advertising i n Period 1.

8. S u m m a r y I t makes sense to take the time and effort at the beginning of a study to create a good sampling or experimental design. This investment can pay off handsomely i n the course of your work. The aim of the experimental design

172

Research Decisions and

Procedures

is to produce data from w h i c h you may derive sound conclusions, and to do so as efficiently as possible. Perhaps the most i m p o r t a n t element of the experimental design is that any differences i n results shown among the various experimental and control groups should arise from differences among the stimuli given to the experimental groups rather than from differences i n original composition among the groups. But i t is often impractical to achieve the ideal i n equalizing the groups—random selection—either because the research is at an early exploratory stage, or because of cost, or because of ethical considerations. The chapter discusses randomized-group designs as w e l l as designs that may be appropriate where random assignment of sub­ jects to groups, or random selection of groups, is not feasible. Designs that vary more than one experimental element at the same time often have advantages over one-variable designs. Multivariate designs can be cheap, and they can produce more information and greater generality of conclusions, b y showing the effect of each variable under a variety of condi­ tions of the other variables; they also can reveal interactions among vari­ ables. A variety of designs can measure delayed effects; most i m p o r t a n t is to be aware of possible delayed effects and to b u i l d them into the design rather than to forget them.

EXERCISES 1. A s s u m e y o u p r e p a r e d a list of a s a m p l e of p e o p l e in a m i d w e s t e r n c i t y of 80,000 p e o p l e , w h i c h y o u c o n s i d e r e d a n e x c e l l e n t r a n d o m s a m p l e of t h e p o p u l a t i o n , a n d t h a t y o u i n t e r v i e w e d t h e p e o p l e o n t h e list a b o u t t h e i r v o t i n g i n t e n t i o n s in a c o m i n g e l e c t i o n . S e v e r a l m o n t h s l a t e r a n o t h e r r e ­ s e a r c h e r w a n t s t o d o a m a r k e t - r e s e a r c h s t u d y o n t h e e n t i r e p o p u l a t i o n of t h e t o w n . C a n h e u s e y o u r o l d l i s t ? Is it o r is it n o t a r a n d o m s a m p l e ? 2. V a r s i t y f o o t b a l l p l a y e r s a t s t a t e u n i v e r s i t i e s r e c e i v e l o w e r g r a d e s t h a n d o e s t h e a v e r a g e s t u d e n t . Is t h i s f a c t a g o o d i n d i c a t i o n t h a t h i g h a t h l e t i c a b i l i t y g o e s w i t h l o w s c h o l a s t i c a b i l i t y in A m e r i c a n b o y s ? 3. T o s h o w h o w d i f f i c u l t it is t o a c h i e v e r a n d o m n e s s , h a v e e a c h p e r s o n in t h e c l a s s p i c k a n u m b e r b e t w e e n 0 a n d 9, a n d w r i t e it d o w n . T h e n d o it a g a i n , a n d a t h i r d t i m e , a n d s e v e r a l m o r e . W h a t p a t t e r n s d o y o u s e e ? M i g h t t h e y i n d i c a t e a d e p a r t u r e f r o m r a n d o m n e s s . ( B u t b e w a r e of f i n d i n g f a l s e p a t t e r n s ; s e e C h a p t e r 30.) 4. H o w w o u l d y o u c o m p i l e a r a n d o m s a m p l e of s t u d e n t s of C h i n e s e e x ­ t r a c t i o n at a u n i v e r s i t y t h a t k e e p s n o r e c o r d s of e t h n i c b a c k g r o u n d ? 5. S p e c i f y a p u r p o s e f o r w h i c h t h e t e l e p h o n e b o o k is not a g o o d

sampling

frame. 6. G i v e a n e x a m p l e in w h i c h s t r a t i f i e d s a m p l i n g w o u l d b e a r e l a t i v e l y i n e x ­ p e n s i v e w a y t o a t t a i n a g i v e n l e v e l of a c c u r a c y . Is t h e r e a n y c a s e in w h i c h s t r a t i f i e d s a m p l i n g w o u l d b e more

expensive?

Designing Experiments

173

7. S h o w h o w a n e s t i m a t e c a n b e u n b i a s e d , e v e n t h o u g h s a m p l i n g u n i t s h a v e different c h a n c e s of g e t t i n g i n t o t h e s a m p l e . ( H i n t : R e m e m b e r t h a t p r o b ­ a b i l i t i e s c a n b e known e v e n if t h e y a r e different.) 8. G i v e a n e x a m p l e in w h i c h c l u s t e r s a m p l i n g p r o v i d e s m o r e a c c u r a c y f o r a given cost than does simple random sampling. 9. G i v e a n e x a m p l e in w h i c h s e q u e n t i a l s a m p l i n g is f e a s i b l e a n d h e l p f u l . 10. G i v e a n e x a m p l e in w h i c h m a t c h i n g s a m p l e s is a n a p p r o p r i a t e technique.

sampling

1 1 . G i v e a n e x a m p l e f r o m y o u r f i e l d in w h i c h t h e e x p e r i m e n t e r c o u l d r e a s o n ­ ably expect to find interaction between two independent variables.

ADDITIONAL

READING

FOR

CHAPTER

11

C a m p b e l l a n d S t a n l e y ' s m o n o g r a p h is a n o u t s t a n d i n g t r e a t m e n t of t h e s u b ­ j e c t of e x p e r i m e n t a l v a l i d i t y . T h e f i r s t p a r t of t h i s c h a p t e r w a s i n s p i r e d by t h e i r t r e a t m e n t of t h e s u b j e c t . A r o n s o n a n d C a r l s m i t h a l s o p r o v i d e u s e f u l information on experimental design. H y m a n a n d W r i g h t , a n d G r e e n w o o d d i s c u s s t h e s p e c i a l p r o b l e m s of e x p e r i ­ m e n t a t i o n in s o c i o l o g y . E x p e r i m e n t a l d e s i g n s , e s p e c i a l l y in p s y c h o l o g y , a r e d e s c r i b e d at l e n g t h by W o o d (Chapters 5-9). Another useful reference c o n c e r n i n g experimental d e s i g n in p s y c h o l o g y is U n d e r w o o d a n d S h a u g h n e s s y . B o y d et al. ( C h a p t e r 3) d i s c u s s e x p e r i m e n t a l d e s i g n s in t h e c o n t e x t of market research. H o v l a n d et al. ( V o l u m e 3) is a c l a s s i c e x p e r i m e n t a l s t u d y in s o c i a l p s y c h o l o g y . W r i g h t a n d H y m a n p r o v i d e a n i n t e r e s t i n g n a r r a t i v e d e s c r i p t i o n of h o w t h e y w e n t a b o u t s e l e c t i n g t h e i r d e s i g n f o r a n e v a l u a t i o n of a t t i t u d e c h a n g e in a Summer Camp.

12 n a n - e x p e r i m e n t a l designs far studying reiotiansliips 1. Time Series—The Long View 2. The Cross Section—The Wide View 3. Causes of Differences in Results from Time-Series and Cross-Sectional Studies 4. Designs for Studying Changes Over Time 5. The Panel 6. Summary

A n experiment has great advantages for studying relationships. B u t some­ times y o u cannot experiment, or choose not to. T h e n y o u must t u r n to examining data as nature throws them up to you. There are t w o basic strategies for collecting naturally occurring data for the purpose of examining relationships. You may compare data from various periods i n the past for the given person or group; this is called a "time series" i n economics and sociology, "longitudinal method" i n psychology and education, and the "historical method" i n anthropology and sociology. Or you may compare different individuals or groups at the same time ( a "cross section"). The essential ingredient to obtaining v a l i d results w i t h either strategy is that the independent v a r i a b l e ( s ) i n w h i c h you are interested must have varied due to reasons unrelated to the nature of the sample periods or sample individuals. For example, perhaps y o u are interested i n the relationship of income to suicide. You m i g h t compare income and suicide i n ( a ) the various U.S. states, or ( b ) various years i n the past i n the U.S. B u t i t may be that there is some important element i n people's makeup that is responsible for both l o w income and l o w suicide, say, education. This w o u l d vitiate b o t h the time series and the cross-sectional approaches—un­ less y o u somehow allow for this and other factors that account for changes i n the independent variable ( i n c o m e ) and that are also related to the de­ pendent variable ( suicide ) . Chapter 23 discusses this obstacle to the use of these non-experimental designs, and h o w to overcome i t .

Non-experimental

Designs for Studying Relationships

175

T h o u g h the t i m e series and the cross section sometimes are alternative designs, b y far the most compelling conclusions emerge i f y o u are able to use both the t i m e series and the cross section, and i f their results agree. I f the results do not agree, this may be a clue to i m p o r t a n t u n d e r l y i n g processes that i t w i l l pay to investigate. M o r e about this i n Section 4. I n some cases, time series and cross sections may be conceptually similar and measure the same phenomenon. For example, one may use a t i m e series and a cross section interchangeably to find out h o w m u c h taller boys are at ten years of age than at five years of age. W e can either measure a group of boys w h e n they are five years o l d and measure them again w h e n they are ten years of age, or we can measure t w o different groups of boys at the same time, one "cohort" of ten-year-olds and one "cohort" of five-year-olds. B u t i n some cases time series and cross sections may capture quite different phenomena, as w h e n political cross-section polls reveal a different relation­ ship between income and political-party membership than do time-series data ( Brunner and L e i p e l t ) . W e shall consider the nature of time series and cross sections i n that order. T h e n we shall take up the special places of the t w o methods for studying processes that develop only over a period of time. Last, w e shall discuss the panel m e t h o d as a device for studying changes over time.

1. T i m e S e r i e s — T h e L o n g V i e w The usual reason for using the time-series m e t h o d is that the historical record provides a set of varied observations of the phenomenon i n question. T h a t is, past periods constitute a bank of data. T h e m a i n drawback of the time-series method is that i t is vulnerable to changes i n general conditions that may be relevant to the phenomenon y o u w a n t to study. For example, there have been steady increases i n the average height of Americans from generation to generation, and therefore a com­ parison of the heights of people of different ages from different generations may be b l u r r e d by the long-term shifts. As another example, researchers wanted to learn h o w the age of a book i n a research l i b r a r y affects h o w much people read i t , i n order to k n o w w h i c h books should be kept i n expensive locations where people can find them easily. A card i n the back of each book revealed h o w many times it had been used each year for the past fifty years. B u t fifty years ago there were many fewer people at universities w h o m i g h t have w i t h d r a w n any book, and therefore the most-read books fifty years ago were read m u c h less than the most-read books now. This increase i n university population distorts the picture to make i t seem that the difference i n the use of a given book n o w and w h e n i t was younger is m u c h less than i t really is. ( T h a t is, there are more potential readers now to offset the decline i n interest. ) There were also many fewer books to compete for attention fifty years ago. T h o u g h these t w o major changes affect the use of books i n opposite directions, we have no

176

Research Decisions and Procedures

reason to t h i n k that they cancel out. E i t h e r one of them could distort the picture badly. I n such a predicament the "cross-sectional" method can help. Instead of comparing the use of a given book w h e n i t was one year o l d w i t h its use w h e n i t was fifty years old, i t was possible to examine the use d u r i n g one year of fifty-year-old books and of one-year-old books. W e thus look for the effect of age b u t under the same conditions of equal numbers of university students and books i n the library. T h e time-series m e t h o d can often be statistically efficient because the people being observed serve as their o w n controls. For a crude example, assume that y o u w a n t to k n o w whether people lose w e i g h t d u r i n g the night. I t is i n t u i t i v e l y obvious that, i f you w e i g h a sample of people b o t h night and m o r n i n g and compare each person s night and m o r n i n g weights, y o u can obtain the same accuracy w i t h a smaller sample than i f you were to weigh one group of people at night and another i n the morning. I f you w e i g h the same people twice, the extraneous variables are mostly held constant, whereas different m o r n i n g and night samples may differ i n many ways that require a large sample to "smooth out" the results. The same principle holds i f you compare the use of a sample of books i n one year to the use of the same books five years later, i f you compare the 1. Q.s of a group of children at five years of age w i t h the I.Q.s of the same children at ten years of age, or i f you compare the sales i n a group of supermarkets i n January to the sales i n the same supermarkets i n February. I n each case a m u c h larger sample w o u l d be needed i f you sampled two different groups of books, children, or supermarkets, because of the m u c h greater variability that w o u l d be introduced. For simplicity, I have been talking of a time series as a comparison of only t w o observations of the same universe at different times. Most often a time series is a m u c h larger number of observations—say, m o n t h l y observations of store sales over a p e r i o d of five years, or yearly observations of the U.S. economy over sixty years, or daily observations of people's moods over a period of several months, or h o u r l y observations over a p e r i o d of weeks. Thus, one has the basis to search for cycHcal effects of various lengths as w e l l as t r e n d effects.

2. T h e Cross S e c t i o n — T h e W i d e V i e w Three advantages of the w i d e v i e w are : 1. The o v e r w h e l m i n g advantage of the w i d e view over the long view is that, because all observations are made at the same time, there is no p r o b l e m of changes i n conditions. 2. Data are often m u c h easier to obtain for a w i d e view. I f y o u w a n t to find out h o w the price of a H o n d a motorcycle depreciates over the life of the motorcycle, y o u m i g h t not be able to obtain data for the prices that particu­ lar cycles sold for i n previous years or even for the prices of any Hondas i n

Non-experimental Designs for Studying Relationships

177

previous years. B u t i t w o u l d be easy to obtain price quotations on Hondas of a l l ages at the present moment. Or, i f y o u are a demographer, or an actuary w o r k i n g for a life-insurance company, and you wish to construct a "life table" that shows the proportions of a group of people w h o w i l l die at various ages, i t is impracticable to follow the course of a given group of people all their lives. Rather, y o u examine the proportions of various age groups that die i n a given year and construct a "synthetic life table." Again, even i f i t were decided to compare the use of the same books w h e n they were one year o l d w i t h the use now, when they are fifty years old, most libraries do not have records of the use of particular books that go back fifty years. I n that case, unless one were w i l l i n g to w a i t another fifty years, there w o u l d be no choice b u t to use the w i d e view and to compare the use of oneyear-old books this year w i t h fifty-year-old books this year. 3. T h e w i d e v i e w can be employed even w h e n the subject matter can be observed only once. For example, y o u cannot shoot the same rifle cartridge twice i n different rifles, to compare the accuracy of the rifles. Instead, y o u must compare the results of two different samples of cartridges. Similarly, asking a person her voting intention i n a campaign may alter her subsequent behavior, i n w h i c h case i t w i l l not be safe to ask her her intentions a second time.^ This danger is probably less w i t h purchase panels, however; there is little theoretical or e m p i r i c a l reason to believe that membership i n a panel w i l l eventually bias a person t o w a r d b r a n d A and against b r a n d B. The major disadvantage of the cross-sectional method is that there is likely to be considerable variation among the sampling units that has no connection w i t h the variables of interest. Such variation does not bias the results, b u t i t does require a larger sample size to achieve any given level of accuracy than i f there were less irrelevant variation among the subjects. For example, y o u m i g h t w a n t to study the effect of Hquor prices on liquor consumption, and y o u consider relating the price of liquor i n various states to per capita consumption i n the states. I t is very difficult to study the effect of l i q u o r prices i n the U n i t e d States as a whole because the level of liquor prices changes infrequently and raggedly. A n d one cannot experimentally raise or lower the prices of liquor i n the country as a whole, or even i n i n d i v i d u a l states or communities, just for the sake of research. The various states already have different price levels, however, because state legislatures have different revenue policies affecting Hquor prices. B u t i t is not possi­ ble simply to compare the consumption of l i q u o r i n various states that have various l i q u o r prices, because there are differences i n consumption from state to state, independent of price. For example. Table 12.1 shows the per capita consumption i n a group of states that have identical prices. There is an analogy between this disadvantage of the w i d e view and the major disadvantage of the long view. W i t h the long view, conditions change 1. But P. Lazarsfeld, et al. conclude that this danger does not arise in voting studies.

178

Research Decisions and Procedures

over the length of the study; w i t h the w i d e view, conditions are different over the w i d t h of the study. Such change may not matter i f plenty of data are available at l o w cost. B u t sometimes there is not enough data available to compensate for this defect of the w i d e view. The variability i n liquor consumption among the states from such other causes as religion is so great that a cross-sectional sample of perhaps 500 states w o u l d be necessary to obtain a reasonably rehable answer—and there are not that many states i n the U n i o n . A solution (J. Simon, 1966a) was to study those instances i n w h i c h the state changed its revenue policy ( a n d hence the prices of l i q u o r ) on a given date and then to compare the consumption patterns before and after the price changes i n each state separately. The difference between the before and after consumption c o u l d then be a t t r i b u t e d to the price change. ( A n d , to take account of possible differences over time i n each state, I standard­ ized b y subtracting the changes i n consumption over the same periods of time i n states i n w h i c h price remained the same from those i n the pricechange states. This device allowed for increases or decreases i n consumption resulting from religious and cultural changes, tension-level changes, and other society-wide effects.) This design really is a combination of the timeseries and cross-section methods, being a cross-sectional sample of changes over time. The cross-section method is often used successfully i n connection w i t h natural geographical differences. TABLE 12.1

State

January 1962 Price Index (Seagrams 7 Crown)

Per Capita Liquor Consumption in 1961

Gallons of Liquor Consumed per $1 Million of Income

4.85 4.85 4.85 4.85

1.26 2.76 2.22 1.75

350 681 613 506

Indiana Massachusetts Minnesota New Mexico SOURCE: Liquor Handbook, 1962.

For example, the amount of smoking per capita varies greatly from country to country. Researchers therefore collected data from the health records of various countries to see whether the incidence of various diseases is related to the extent of smoking. B u t this device creates its o w n new obstacle: A h i g h level of tension i n a country m i g h t account for both the high smoking and the h i g h heart-disease rates i n the country, rather than the smoking causing the heart disease.

Non-experimental

Designs for Studying Relationships

179

Geographical differences were the core of R. Cavan s cross-sectional study of suicide and anomie. Cavan compared the suicide rates i n the various rings of neighborhoods at various distances from the center of Chicago; the closer to the center of the city, the higher the suicide rate. Communities in the suicide belt.—Chicago has four suicidal areas: the "Loop" or central business district and its periphery of cheap hotels for men and sooty flats over stores (No. 1 on Map V I ) ; the Lower North Side, particularly the central part of this district, which includes a shifting population of unattached men and an equally shifting population of young men and women i n the roominghouse areas (No. 64 on the map) ; the Near South Side linking the Loop on the north with the Negro area to the south and having one-fourth of its population Negro (No. 2 on the map); and the West Madison area, with its womanless street of flophouses, missions, cheap restaurants, and hundreds of men who drift in aimless, bleary-eyed abandon (No. 40 on the map). {See Figure 12.1) D . Schwartzman made clever use of a geographical comparison of the U n i t e d States and Canada i n assessing the effects of monopoly on prices and wages. H e compared wages and prices i n the U n i t e d States w i t h those i n Canada for those industries that differed i n the degree of monopoly i n the U n i t e d States and Canada. H e was able to estimate the extent to w h i c h a larger share of the market being held b y a few firms pushes prices u p and wages d o w n . ( T h e effect on prices is economically significant, b u t not for wages. ) T h e previous examples dealt w i t h the use of geographical differences. M o r e common is the use of geography on the assumption that the various areas are the same. For example, market researchers often take advantage of geographical similarities to test marketing tactics. A company w i l l advertise or distribute coupons i n one t o w n , not advertise or place no coupons i n another t o w n , and then compare sales i n the t w o towns. D i s t r i b u t i n g coupons to some people b u t not to others w i t h i n a single t o w n w o u l d avoid geographical differences, b u t w i t h such a design i t w o u l d not be easy to measure the sales for the coupon group versus the no-coupon group. Sometimes y o u can match geographic areas on all the relevant dimen­ sions. For example, General Foods Corporation matched one city against another or one "test" city against the "normal rest of the U n i t e d States" on these characteristics : The test city should contain a cross section of age, sex, education, family size, income, ethnic and religious groups that approximate those of the nation as a whole. No city matches precisely, but some come pretty close. For instance: Columbus, Ohio; Syracuse, N.Y. Characteristics of the test city's economy are vital. I f a city's economy fluctuates w i t h the season, or i t is a resort area, an accurate sales picture cannot be ob­ tained for food products.

180

Research Decisions and

Procedures

F I G U R E 12.1 Comparison of Suicide and Other Indications of Disorganization in Chicago, by Communities

SUICIDE (1919-1921) Communities in highest 5 per cent of rates (35-87 suicides per 100,000 of the population). Communities in upper quartiie and be­ low highest 5 per cent (17-25 suicides per 100,000 of the population). Communities below upper quartiie (less than 17 suicides per 100,000 of the population). OTHER INDICATIONS OF DISORGANIZATION Rooming-House Area, 1923. Vice Area, 1922. Pawn Shop Centers, 1926. Drug-Peddling Centers, 1926.

Source: From Ruth S. Cavan, Suicide, of Chicago Press.

1928, p. 81. Reprinted by permission of The University

Characteristics of the food trade in the test city are a factor. The ratio of food chains to independent grocery stores is crucial. Some chains feature their own brands; some will not permit auditing. A balance of chains and independents is necessary. Also crucial: Whether the big wholesalers serve several cities from one warehouse. If so, it is hard to enforce limited distribution.

Non-experimental

Designs for Studying Relationships

181

Geographical isolation is also important for advertishig considerations. To ad­ vertise where there is no distribution is wasteful, and irritates potential cus­ tomers. Media availabihty requirements include at least two television outlets, two news­ papers and a Sunday supplement that can handle color ads. Individual characteristics of a city are equally important. A city that is a good test market for one kind of product may be a poor city for another. Examples: Salt Lake City and St. Louis are unsuitable for testing coffee. Reasons: Salt Lake City is the center of the Mormon religion, which forbids coffee drinking; St. Louis housewives have a strong loyalty to a regional brand. Others: Southern cities are a good place to test new coconut products because home baking is customary; Californians are adventurous about trying new salad dressings, but the state is a bad place to test an artificial orange drink. (Printers' Ink, August 27, 1965, p. 30) M a t c h i n g cities may be quite a satisfactory tactic. B u t i t always leaves disquieting d o u b t that perhaps all the i m p o r t a n t dimensions have not been matched. This doubt gnaws hardest w h e n the experimental differences are relatively small. For example, i n one pubHshed study ( S t e w a r t ) the market area that received no advertising purchased more of the product than the matched market area that received substantial advertising. I t is possible that the conclusion is sound; however, I continue to t h i n k that i t was the lack of match between the test areas that accounted for such a strange result.^ ( I t w o u l d be possible to check this b y examining the changes i n sales i n the various areas over the experimental period, i f data were collected before as w e l l as after the experimental period.) A n improvement is to compare groups of geographical areas w i t h one another, selecting areas to be i n each group on a random basis. Instead of comparing Rochester w i t h Syracuse, experiment w i t h half a dozen r a n d o m l y selected towns or w i t h neighborhoods i n half a dozen cities, and compare them w i t h another half a dozen towns or w i t h neighborhoods i n other cities. T h e larger the sample, the stronger the protection against geographical bias. Another tactic is to use each geographical area as its o w n control. The "criss-cross" is a simple design for situations i n w h i c h y o u can control the experiment. For example, first distribute coupons i n Syracuse and not i n Rochester; then switch. T h e n y o u can fairly compare the total redeemed coupons i n Rochester w i t h the total from Syracuse. Studies of matched geographical areas can sometimes be conducted even w h e n experimentation is impossible. C. M i l l s made a fascinating study of three small cities that h a d each come to be dominated b y one or a few b i g businesses and of three matched small cities i n w h i c h smaller businesses mainly supported the local economies (Kefauver, p. 167). 2. J. Gold showed data on the extent of variation among test markets; the extent is either frightening or reassuring, depending on what you are doing.

182

Research Decisions and Procedures

3. C a u s e s of Differences i n Results from Time-Series a n d Cross-Sectional Studies Sometimes the results produced by time-series and cross-sectional studies differ greatly, p r o d u c i n g apparent paradoxes. For example, w i t h i n various industrialized countries most cross-sectional studies of the relationship be­ tween income and fertility show that lower-income families have more chil­ dren. O n the other hand, time-series studies over the period of a few business cycles show that the higher per capita income is, the more children people have. A n d to confuse things further, time-series studies over ^ong periods of time—say 100 years—show drops i n fertility as per capita income rises. T h e fact that the subjects observed sequentially i n a t i m e series serve as their o w n controls, whereas this mechanism is not present i n a cross section, sometimes explains a discrepancy between cross-section and time-series re­ sults. For example, a simple time series relating national per capita income i n various years to fertility ( a n d also to suicide) i n those years shows a posi­ tive relationship, whereas the same simple relationship is negative i n a cross section of states i n the U.S. The explanation is that education is systemati­ cally and strongly related to income i n the various states, whereas i n the time series, average education is not related to per capita income. ( O f course one may h o l d education constant statistically i n the cross section, and this does indeed change the apparent influence of income or reconcile the difference between time series and cross section; Simon, 1969, 1974; Barnes; this strengthens the basic p o i n t . ) Another cause of discrepancy between time-series and cross-sectional re­ sults—as seen i n studies of the relationship between fertility or suicide and income—is that the cross-sectional studies, and usually the time-series studies too, do not take proper account of the past effects of the relevant variables (income, i n this case). The cross section includes data only on people's present incomes, b u t their past incomes also influence their present be­ havior, sometimes i n ways that counteract the effects of present income. For example, higher-income famihes give their children more education, w h i c h may teach t h e m to want, and h o w to achieve, smaller families. B u t the statistical natures of the short- and long-time series and the cross sections are quite different, and they are affected differently b y the past. Therefore one must be particularly thoughtful w h e n choosing a wide-view or a longv i e w method whenever the past may still have influence on the present values of the dependent variable.^ Still another cause of discrepancy between time-series and cross-sectional results may be a difference i n the "level of aggregation." T h a t is, time series are likely to use national averages as observations whereas cross sections are likely to observe individuals or small groups. For various reasons discussed i n Chapter 6, such a difference i n units of observation can lead to differences i n results. 3. This argument is developed formally and at length by J. Simon and D. Aigner.

Non-experimental

Designs for Studying Relationships

183

4. Designs for Studying C h a n g e s O v e r T i m e People change. A few examples: First, the percentage results of a presiden­ t i a l election w o u l d be somewhat different i f the election were held a m o n t h earlier or later. The results w o u l d even differ slightly i f the election were a day earlier or later. ( E v e n i f d u r i n g the campaign few people change their minds about w h i c h candidate they prefer, the p r o p o r t i o n of people w h o are prepared to vote may w e l l change.) Second, a question asked at the begin­ n i n g of an i n t e r v i e w may get a different answer than i f i t were asked at the end of the interview, either because of fatigue or because an earlier question may alter the interviewee's knowledge or attitude t o w a r d the subject of a later question. T h i r d , at the end of the day chimpanzees may take longer to learn h o w to reach the banana than at the beginning, or they may learn more slowly d u r i n g one season of the year than d u r i n g another. F o u r t h , i n 1890 practically no one smoked cigarettes. Fifty years later half the nation smoked cigarettes. These changes over time are breaches of ceteris paribus for studies that use several time periods as samples.^ W e say that changes occur w i t h time. T r u e enough. B u t i t is not time itself that brings the changes to pass; rather, various processes that occur over time b r i n g change. I f people were frozen i n such a w a y as to slow all their b o d i l y processes b u t not to k i l l them—as can be done w i t h small cells and h u m a n sperm—the passage of time w o u l d have no effect. Rather, i t is fatigue that affects the chimpanzees' performance late i n the day. I t m i g h t be g r o w i n g annoyance that produces a different answer to a question late i n an i n t e r v i e w . I t is changes i n the w o r l d situation that alter the election-poll results over time. A n d i t was a shift i n tastes as w e l l as improvements i n tobacco and cigarette-making technology that caused the increase i n U n i t e d States cigarette consumption i n the t w e n t i e t h century. The w o r d "time" itself is n o t h i n g b u t shorthand for other changes that occur. Change over time comes i n several varieties. A change can be a long-run trend; for example, we believe that the long-run change i n our economy w i l l be u p w a r d . Change can be cyclical: Each m o r n i n g the chimpanzees' re­ sponses may be faster than i n the evening. Or change can be totally unpre­ dictable and w i t h o u t apparent regularity, as is the case w i t h the interest rates on the b o n d market from day to day ( C o o t n e r ) . ( I f y o u do find regularities i n the b o n d market that y o u can predict i n advance, be sure to let me k n o w i m m e d i a t e l y . ) A n y or all of these kinds of change over time can t h r o w a monkey ( c h i m p ) wrench into your research. Even i f a change seems to occur over time—or along w i t h some other change i n condition—you often cannot be sure ivhich of the other things is not being held constant and is therefore to blame. E a r l y i n his interviewing, A. Kinsey found that the incidence of premarital intercourse was 44.9 per­ cent at age nineteen for single males whose educational level was thirteen or 4. Whether one can predict the future from the past depends upon whether one can assume ceteris paribus from the observed period to the future. This issue is discussed in detail in Chapter 3.

184

Research Decisions and

Procedures

more years, b u t the incidence was 32.5 percent i n later interviews (Cochran, et al., p. 8 8 ) . This change m i g h t result from changes i n people's activities over time. B u t i t m i g h t also reflect changes i n i n t e r v i e w i n g technique, differ­ ences i n the sampling process, or simply sampling variation (Kinsey, et al, p. 146). The experts w h o audited the Kinsey w o r k emphasize that the researcher's good judgment is essential i n i n t e r p r e t i n g the result. The obvious w a y to study changes over time is to obtain a sample of the subject matter and actually to w a t c h its change. Some famous studies have f o l l o w e d phenomena over a long time. L . Terman, for example, tracked the g r o w t h patterns and m a t u r i t y of a group of exceptional children for almost t h i r t y years. C. Seltzer studied the health and habits of a group of H a r v a r d undergraduates for many years after graduation. A n d economists have studied the course of some series of agricultural prices for hundreds of years. But, there are many obstacles to the straightforward application of the long-view method—notably that, unless the data from the past already exist, few of us have the patience to w a i t around for slow-moving events to u n f o l d themselves. Therefore w e must also consider another m e t h o d as a substi­ tute; sometimes we can study, in the present, "cohorts" of people or objects that are of different ages. W h a t the researcher does about the change over time should depend u p o n w h a t he is t r y i n g to find out and the nature of the situation. Consider the matter of the date on w h i c h a presidential election is held. I f the date were not automatically set i n advance, the government i n power m i g h t alter the date to suit its o w n convenience, that is, set a time w h e n i t was especially strong, as the government i n E n g l a n d sometimes does. I n the U n i t e d States system, the date is fixed l o n g i n advance; whether the date is good or b a d for the party i n power is therefore left to chance. The political pollster's situation is different. She cannot just pick a day at random to take her p o l l and then blithely use the result as a prediction of the election results. She knows that the closer to the election the day of the p o l l is (other things being e q u a l ) , the better w i l l be the prediction. O n the other hand, a p o l l taken the day before the election does not y i e l d information that is of m u c h benefit to the candidates or of m u c h interest to anyone else. No matter w h e n she takes her p o l l , however, the pollster takes all her interviews for a given p o l l on the same day ( o r d u r i n g a single w e e k ) so that she can at least describe the situation correctly at some time, and she offers the results as measures of election sentiment at that time, rather than as predictions of h o w the election w i l l t u r n out. I f the purpose of the poH is to a i d the decisions of candidates about w h a t to say and where to say i t , that interpretation of the results may be quite v a l i d . The pollster may also p o l l at several different times to see whether there is a "trend." I f she thinks she sees a trend, her prediction may contain an adjustment of her latest results to allow for the assumed trend. B u t one

Non-experimental

Designs for Studying Relationships

185

should adjust w i t h great caution, i f at all, because there is no sure w a y of distinguishing a true t r e n d from a wave that rises and recedes or from random fluctuations. H . Ebbinghaus faced several kinds of time-change problems i n his classic study of learning nonsense syllables: . . . [C]are was taken that the objective conditions of life during the period of the tests were so controlled as to eliminate too great changes or irregularities. Of course, since the tests extended over many months, this was possible only to a limited extent. But even so, the attempt was made to conduct, under as similar conditions of life as possible, those tests the results of which were to be directly compared. I n particular the activity immediately preceding the test was kept as constant i n character as was possible. Since the mental as well as the physical condition of man is subject to an evident periodicity of 24 hours, it was taken for granted that like experimental conditions are obtainable only at like times of day. However, in order to cany out more than one test in a given day, different ex­ periments were occasionally carried on together at different times of day. When too great changes in the outer and inner life occurred, the tests were discon­ tinued for a length of time. Their resumption was preceded by some days of renewed training varying according to the length of the interruption. (Ebbing­ haus, p p . 2 5 - 2 6 ) Sponsors of television rating services generally w a n t to k n o w how many people are w a t c h i n g the program on the average throughout the program, rather than at one particular time, because the commercial may come on at any time w i t h i n the show.^ A n d , because the audience of a show varies d u r i n g the show, i t is not sensible to find out how many are w a t c h i n g at any one particular time. The television rating firms therefore take samples of the viewers throughout the program, and the ratings are expressed as averages ( a n d t o t a l s ) . The average is an estimate of h o w many people are w a t c h i n g at any single time w i t h i n the program. 5. T h e P a n e l T h e panel is a special type of time-series technique; i t measures some at­ tributes of a given sample of people at several moments. B u t i t differs from other long-view studies i n t w o ways. First, the panel study is more likely to have t r u l y historical interest than are other long-view studies; i t is usually concerned w i t h w h a t has happened at particidar times; for this aim i t is conceptually impossible to substitute a wide-view study. For example, a w i d e - v i e w study at a single moment cannot be used to find out h o w voters shift from candidate to candidate d u r i n g the campaign or h o w the market share of a particular b r a n d is faring. There is no substitute for data on September 1, October 1, and so forth. 5. More strictly, sponsors want to know how many people are actually watching when the commercial comes on. But we shall not consider that part of the problem except to note that many more interviewers would be needed to measure the audience at any given moment than are needed to measure the average throughout the show.

186

Research Decisions and Procedures

A panel is not the only way to obtain this type of historical information, however. I t is possible to take separate samples at various points i n t i m e instead of collecting data on the single panel sample. For example, i f y o u w a n t to k n o w i n absolute terms w h a t p r o p o r t i o n of the vote your candidate has at different times before the election, then i t really does not matter, conceptually, whether y o u p o l l 1,000 people t w o months before the election and another 1,000 people one m o n t h before or whether y o u p o l l a 1,000person panel twice. There may be practical differences between these t w o strategies, such as the possibility of spoiling the panel b y p o l l i n g them once. But there are not Hkely to be major cost differences between these t w o strategies (except perhaps higher costs i n finding the same people t w i c e ) . Similarly, separate samples give conceptually the same television ratings, but for some r a t i n g methods—especially those that require placing elec­ tronic meters i n homes—it is enormously cheaper to use panels of the same people for repeated measurements. A second major difference between the panel and nonpanel long-view techniques is that the panel is m u c h more efficient w h e n y o u w a n t to mea­ sure changes from period to period rather than the absolute levels. A campaign manager wants to k n o w whether people are shifting t o w a r d his candidate, or the soap firm wants to k n o w whether more people are shifting to its b r a n d than away from i t . The television executive wants to k n o w whether one episode of a show is better or worse than another episode i n terms of listenership. For such comparative problems the panel method offers great statistical efficiency; there is m u c h less sampHng error i n the panel, because individuals can be compared to themselves at different moments. The third—and perhaps most important—difference between the panel and nonpanel long-view methods is that the panel m e t h o d can reveal m u c h back-and-forth shifting behavior that is otherwise h i d d e n from view. The first i m p o r t a n t use of the panel method i n voting studies was i n the E r i e County survey i n 1940. Lazarsfeld, et at, noted these advantages of the panel m e t h o d i n studying the process b y w h i c h people decide h o w they w i l l vote: The full effect of a campaign cannot be investigated through a sequence of polls conducted w i t h different people. They show only majority tendencies which are actually the residual result of various sorts of changes—to or from indecision and from one part to the other. They conceal minor changes which cancel out one another and even major changes if they are countered by opposing trends. And most of all, they do not show who is changing. They do not follow the vagaries of the individual voter along the path to his vote, to discover the relative effect of various influential factors upon his final vote. (p. 2) This simple table has a surprising number of implications. Let us assume for a moment that the interviews in October and November had been conducted with different people, rather than with the same people, as was actually the case. Then, the findings would have read as follows: in October 42 percent (167 out

Non-experimental Designs for Studying Relationships

187

Vote Intention in October

Don't Know

Don't Expect to Vote

Total

Actual Vote

Republican

Republican Democrat Didn't vote

215 4 10

7 144 16

4 12 6

6 0 59

232 160 91

229

167

22"

65

483

Total persons

Democratic

of 396) of those who had a vote intention meant to vote for the Democratic Party; in November 41 percent (160 out of 392) voted for it. This would have given the impression of great constancy in political attitudes. Actually, however, only the people in the major diagonal of the table remained unchanged: 418 out of 483 respondents did what they intended to do; 13 percent changed their minds one way or another, (p. ix) A panel study is not always feasible, however. One difficulty is that the events or thoughts may already be long past b y the time the researcher begins. Occasionally this difficulty can be overcome by the use of retrospec­ tive questions, as illustrated i n the question block on happiness i n Chapter 22. M e m o r y is not always reliable, however, and indeed one of the panel's major charms is that i t avoids memory loss because questions are asked about contemporary behavior and thoughts. Still a t h i r d difficulty of the panel m e t h o d is "mortahty," the loss of re­ spondents from wave to wave of interviewing. A n d there can also be statis­ tically dangerous mixups i n the identities of interviewees from wave to wave. Another panel difficulty is that the panel questions may affect people's behavior, one of the repetition effects discussed i n Chapter 22. A n d some­ times the cost is high. Nevertheless, the panel can sometimes provide data that cannot be obtained i n any other way.

6. S u m m a r y W h e n experimentation is not feasible, one turns to the data that occur w i t h o u t researcher interference. T w o such basic research designs are the cross section and the time series. I n some situations, cross sections and time series measure the same phenomena, b u t i n other situations they capture quite different processes. The differences can arise from changes over time, from slow-acting influ­ ences, or from differences i n the u n i t of observation and level of aggrega­ tion. Wherever possible, one should t r y to use both a cross section and a time series. I f the results agree, the conclusion is strongly supported. I f the re­ sults disagree, they may p o i n t to i m p o r t a n t u n d e r l y i n g processes.

188

Research Decisions and

Procedures

M a n y of the most i m p o r t a n t phenomena i n social science unfold slowly over a period of years. Changes over time are not caused by abstract time itself b u t rather by various processes that occur over time. A n d such changes may be trends or cycles. I n some cases y o u may grapple w i t h changes over time by t a k i n g measurements appropriately spaced over the time period. I n other cases y o u may be able to w o r k at a given moment w i t h samples of various ages. As usual, the appropriate tactic must depend upon the purpose of the research. As a substitute for observing the change over its full period, one may instead examine and compare at a given moment a cross section of people (the "wide v i e w " ) w h o are at different stages of the unfolding of the phenomena. The w i d e view also protects against irrelevant b u t large changes i n general social and economic conditions that always occur over long periods of time. A very different situation is that i n w h i c h the researcher wishes to k n o w how people w i l l behave w h e n subjected to a single different condition found i n different areas. B u t so m u c h else also varies from place to place that a reasonable comparison cannot be made. As a substitute for the wide-view comparison, the researcher may t u r n to a long view from historical records of given people w h o were exposed to different treatments at different times i n their lives. T h e panel m e t h o d offers the best advantages of the w i d e view and the long v i e w combined, and hence i t has enormous research power. I t has great statistical efficiency because the same individuals can be compared w i t h themselves at different times, hence reducing extraneous variability, and they can also be compared to each other. Panels require forethought and m u c h organization, as w e l l as expensive observation over the study period. B u t they are nevertheless the most appropriate m e t h o d i n many research situations.

EXERCISES 1 . G i v e t h r e e e x a m p l e s in y o u r f i e l d of u s e of t h e l o n g - v i e w a p p r o a c h . 2. G i v e t h r e e e x a m p l e s in y o u r f i e l d of u s e of t h e w i d e - v i e w a p p r o a c h ( t h a t is, the cross section). 3. G i v e a n e x a m p l e of t h e u s e of t h e p a n e l in y o u r f i e l d a n d w h y it w a s p r e ­ ferred to other methods. 4. G i v e a n e x a m p l e in w h i c h d a t a a r e e a s i e r t o o b t a i n f o r a w i d e v i e w t h a n f o r a long view; give a contrary example. 5. E x p l a i n w h y t h e l o n g - v i e w t e c h n i q u e u s e d in o n e of t h e e x a m p l e s in E x e r ­ cise 1 w a s preferable to the wide-view a p p r o a c h . Do the same for one w i d e - v i e w e x a m p l e in E x e r c i s e 2, in p r e f e r e n c e t o t h e l o n g - v i e w a p p r o a c h . 6. A u n i v e r s i t y r u n s a h i g h s c h o o l f o r g i f t e d s t u d e n t s . Y o u w o n d e r w h e t h e r t h e a d m i s s i o n s y s t e m is b i a s e d in f a v o r of t h e c h i l d r e n of p a r e n t s w h o

Non-experimental

Designs for Studying Relationships

189

w o r k in t h e s c h o o l of e d u c a t i o n at t h e u n i v e r s i t y . H o w w o u l d y o u d e t e r m i n e w h e t h e r s u c h a b i a s s h o w s in t h e c o m p o s i t i o n of t h e s t u d e n t b o d y ? ( D o n ' t f o r g e t c h i l d r e n w h o s e p a r e n t s d o n ' t w o r k at t h e u n i v e r s i t y b u t w h o a r e a c ­ c e p t e d at t h e u n i v e r s i t y h i g h s c h o o l . ) 7. H o w w o u l d y o u g o a b o u t c h e c k i n g w h e t h e r t h e r e is a b i a s in f a v o r

of

b r o t h e r s a n d s i s t e r s w h o a r e a l r e a d y at t h e u n i v e r s i t y h i g h s c h o o l ? T h a t is, d o e s a s t u d e n t w i t h a n o l d e r s i b l i n g h a v e a b e t t e r t h a n a v e r a g e c h a n c e of being accepted?

ADDITIONAL

READING

FOR

CHAPTER

12

Z e i s e l ( C h a p t e r 10) g i v e s a f u l l b u t s i m p l e a c c o u n t of t h e p a n e l m e t h o d in public opinion and advertising research. Ferber and Verdoorn (pp. 267-277) d i s c u s s c o n s u m e r - p a n e l u s e in e c o n o m i c s a n d b u s i n e s s . T h e y p r e s e n t a g r e a t d e a l of i n f o r m a t i o n o n t h e t i m e - s e r i e s m e t h o d a n d t h e c r o s s - s e c t i o n a l approach (Chapters 5-9). F o r a n o t h e r u s e f u l r e f e r e n c e o n p a n e l s , s e e t h e m a n w h o is t h e f a t h e r of p a n e l r e s e a r c h , L a z a r s f e l d (1948).

13 s u r v e y s : pro, c a n , a n d haw t a da t h e m 1. 2. 3. 4. 5. 6.

The Nature of Surveys Advantages of the Survey Method for Relationship Research Disadvantages of the Survey Method for Relationship Research Descriptive Surveys The Steps in Executing a Survey Summary

1. T h e Nature of Surveys A survey gathers data about variables as they are found in the world. The survey can observe behavior, as for example whether people are athletes, whether they smoke, whether the money supply is h i g h i n some years, and whether there is prosperity i n those years. The survey can also collect data on w h a t people say; for example, researchers can ask people of various backgrounds for w h o m they w i l l vote or how^ m u c h l i q u o r they drink. The important distinction between the survey and the experiment is that the survey takes the w o r l d as i t comes, w i t h o u t t r y i n g to alter i t , whereas the experiment systematically alters some aspects of the w o r l d i n order to see w h a t changes follow. For example, a mother m i g h t w a n t to learn the causes of her baby's food rash. She m i g h t keep a diary of w h a t foods the baby eats each day and whether he has a rash that day and the next day; that w o u l d be nonexperimental observation. Or she could systematically vary the foods that she gives the baby each day, t r y i n g first one food alone and then another food alone, n o t i n g the days on w h i c h the rash occurs; that w o u l d be an experiment. The data for a survey may already exist i n the form of records such as the national census or the questionnaire data i n the Roper Center repository ( W i l l i a m s t o w n , Massachusetts) that were collected i n the past; or y o u may need to collect new data especially for your purposes. The logic of the survey method is m u c h the same either way.

Surveys: Pro, Con, and How to Do Them

191

The t e r m "survey research" is apphed to t w o very different sorts of inves­ tigation. The first aims to learn about relationships between variables, espe­ cially causal relationships. Causal-analysis survey research is quite analogous to experimentation, w i t h the single ( b u t o v e r w h e l m i n g l y impor­ tant ) difference that the independent v a r i a b l e ( s ) is not controlled and manipulated b y the researcher. Instead the researcher seeks out groups of people that have already been exposed to different levels of the independent variable. For example, instead of subjecting randomly-selected groups of people to different amounts of cigarette smoke, the researcher finds people w h o have smoked various numbers of cigarettes. Or a researcher w h o wants to study the effect of family income on juvenile delinquency does not choose various groups of families to receive various incomes; rather, he finds and assesses the amount of delinquency i n families w i t h different incomes. The steps i n pursuing causal-analysis survey research are m u c h the same as those set forth for an experiment. Furthermore, the obstacles to studying causal relationships w i t h nonexperimental survey research and the methods of overcoming these obstacles contribute m u c h of the subject matter of this book. Therefore, we shall not pursue the matter further i n this chapter, except to summarize the advantages and disadvantages of the survey m e t h o d for causal analysis. After that, we shall focus on surveys that aim to provide quantitative descriptions of some aspects of a universe rather than to discover relationships.

2. Advantages of the S u r v e y M e t h o d for Relationship R e s e a r c h First, ivith a survey you can get closer to the 'Wear hypothetical (theoreti­ cal) variables than toith a laboratory experiment. You can actually inspect the variables i n their real-world setting; for instance, y o u can examine real cases of l u n g cancer and real movements of the economy w i t h o u t having to abstract from the real variables to a mock-up laboratory situation. This is the preeminent advantage of a survey over an experiment i n those cases i n w h i c h y o u w a n t to investigate relationships b u t i n w h i c h real-world experi­ ments are impossible. Second, a survey is often quite cheap, especially i f you can use already existing records and data. I f data exist for the prices and amounts of onions sold each m o n t h for several years, using them to explore the relationship between price and quantity is obviously cheaper than setting up a labora­ tory situation i n w h i c h people are given quantities of money and opportuni­ ties to purchase onions and other foods at v a r y i n g onion prices. T h i r d , huge masses of data are often already available or can be culled from existing records—voter-registration lists, for example. This is a major statistical advantage, because the large samples provide h i g h internal reli­ ability. Such huge samples are seldom available i n experimentation. F o u r t h , surveys can y i e l d a very rich understanding of people—both i n

192

Research Decisions and Procedures

breadth b y collecting a w e a l t h of information, and i n depth by p r o b i n g people's motives.

3. D i s a d v a n t a g e s of the S u r v e y M e t h o d for Relationship R e s e a r c h The major disadvantages described here apply only to causal and noncausal relationship research and not to census-type research. First, the crucial disadvantage of the survey m e t h o d i n causal analysis is the lack of manipulation of the independent variable. Because there is no "controlled" variation i n the independent variable, i t is always possible that the correlation between the independent and dependent variables is not "causal" (see Chapter 2 3 ) . B u t i t is a mistake to say that survey results never show causation. W h e t h e r the results of a survey are causal depends upon many things (see Chapter 3 2 ) . One short example here: Changes i n state l i q u o r taxes are accompanied b y changes i n the prices of liquor. T h e effects of these changes i n price upon liquor consumption can be studied. The changes i n consumption may reasonably be said to be caused b y the changes i n price because there is no likely connection between consumption and the moment w h e n the legislators decide to raise the tax; the states act i n m u c h the same w a y that an experimenter w o u l d i f he were randomly select­ i n g w h e n to raise taxes. There is no other likely relationship between the tax raise ( a n d the price change) and the change i n consumption, and therefore i t is reasonable to say that the price change causes the change i n liquor consumption. T o repeat the m a i n point, a survey lacks the almost clinching proof of actually t r y i n g out the relationship b y varying the independent variable to see whether i t is indeed followed b y changes i n the dependent variable. A second disadvantage of the survey is that one cannot progressively investigate one aspect after another of the independent variable to get closer to the "real" cause. One cannot first t r y out the cigarette, then the cigarette paper and the tobacco separately, and so f o r t h u n t i l the ingredient that really causes cancer is isolated. T h i r d , statistical devices are not ahvays able to separate the effects of several independent variables w h e n there is m u l t i v a r i a b l e causation, espe­ cially w h e n t w o independent variables are themselves h i g h l y associated. For example, the same people tend to have h i g h incomes and high education; therefore, i t is very difficult to tell from survey results whether, say, i t is education or income that causes the purchase of books and "high class" magazines. M y final comment on the choice of survey or experiment for causal analy­ sis is o l d stuff to y o u b y now. Several methods are better than one. I f y o u can seek the knowledge y o u w a n t w i t h both a survey and an experiment and if the results jibe reasonably w e l l , y o u have a m u c h stronger basis for belief i n your results than i f your conclusions were based on just one of the techniques.

Surveys: Pro, Con, and How to Do Them

193

4. D e s c r i p t i v e Surveys N o w let us discuss surveys that are not intended to discover causal relation­ ships b u t rather aim to survey. T h a t is, we m i g h t call them survey, surveys or, more conventionally, descriptive surveys, i n contradistinction to causalresearch surveys. They are surveys whose purpose is to provide true quanti­ tative descriptions of aspects of a universe of people or things. Because the purpose of the descriptive survey is to obtain an accurate picture of the universe, random sampling is particularly important. I f the sample is biased i n some way, so that i t does not cover an i m p o r t a n t seg­ ment of the universe, and i f each segment is not sampled i n p r o p o r t i o n to the relative size of the segment, then the picture of the universe w i l l be distorted and misleading (unless the nonproportional sampling is done purposely and w i t h full k n o w l e d g e ) . I t is obvious that you cannot find out who w i l l w i n the next election b y asking only Republicans for w h o m they w i l l vote; yet The Literary Digest d i d almost precisely that w h e n i t pre­ dicted that L a n d o n w^ould w i n i n 1936. Usually the bias is more subtle, however, and therefore more dangerous. For example, a local civic associa­ tion decided that i t w o u l d survey w h a t the people i n one city t h i n k about the educational system, the w o r k opportunities, and other aspects of the community. The survey was w o r k e d out i n very nice detail w i t h b u t one flaw: The sampfing plan originally o m i t t e d all streets n o r t h of the tracks, where most blacks live, because the person r u n n i n g the survey thought that "it w o u l d be dangerous for student interviewers." O f course, i t is just those blacks w h o w o u l d most likely be dissatisfied w i t h the educational system and other city services. The bias introduced b y not taking a random sample must have distorted the results to make the picture seem rosier than i t is—though perhaps that was w h a t the civic association really wanted. There are several dimensions of classification that tell us something about the nature of a survey: 1) As already noted, a survey can aim to discover causal relationships or to create accurate quantitative descriptions of one or more aspects of a universe. A l l subsequent discussion applies only to descriptive surveys. 2) A survey can be a complete census of the universe, or i t can be a sample survey of the universe i n microcosm. The advantage of the complete census is accuracy; the advantage of the sample survey is lower cost. 3) A survey need not be a survey of people. You can survey either people or things. The library study alluded to earlier surveyed the use of hooks i n libraries. O r the survey can be of animals ( h o w many cattle are there i n Texas?) or of plants ( h o w many acres of corn were planted i n Illinois last year?). Most social-scientific surveys are of people, however. A n d one may survey groups as w e l l as individuals. Families are the smallest groups com­ monly studied. The largest groups are nations; one can survey the nations of the w o r l d to discover their policies t o w a r d population control by sending questionnaires to the relevant bureaucrats i n each nation. V o l u n t a r y organi-

194

Research Decisions and Procedures

zations are forever surveying tlieir local groups as, for example, w h e n they ask for yearly reports on membership and activities undertaken throughout the year. 4 ) A survey can either observe or ask questions. A l l surveys of nonhuman material use the observational method—at least u n t i l we find a horse that really does talk. B u t many surveys of human beings also observe behavior rather than asking questions, as, for example, w h e n we count the passengers w h o ride buses, observe h o w many people b u y a product at different price levels, or meter the number of television sets tuned to a given program. 5 ) One borderline technique between observing and questioning is to ask people to observe themselves ( t h e diary t e c h n i q u e ) . Another is to ask them w h a t they have done i n the past ( A . Kinsey relied heavily on this retrospec­ tive t e c h n i q u e ) . Observation b y the researcher, his assistants, or mechanical devices is generally preferable to self-observation, b u t often i t is too costly or otherwise impractical, as, for example, i n sex surveys. Self-observations can have severe limitations; H . C a n t r i l found that only 86 per cent of people w h o were i n t e r v i e w e d twice at a three-week interval gave the same answer b o t h times about whether they owned a car, and only 87 per cent gave the same answer about h o w they voted i n the 1940 presidential election ( C a n t r i l , pp. 102-103). The diary technique can avoid such memory losses, b u t memory failure is not the only cause of the type of discrepancy found i n the second interviews. Instruments enhance the power of the researcher to observe h u m a n be­ havior. One-way glass enables the psychologist to see w i t h o u t affecting the subject. F i n g e r p r i n t paper enables the magazine researcher to count the number of people w h o t h u m b pages. I n f r a r e d dust on people's shoes leaves detectable traces where the subjects walk. The eye camera records eye movements of people as they w a l k t h r o u g h supermarkets. The camera and tape recorder are invaluable additions to the anthropologist's armamentar­ i u m , as M . M e a d never ceases to r e m i n d us.^ Instruments can also be used to observe physiological states like b l o o d pressure and galvanic skin re­ sponse; they are useful i n studying levels of emotional responses to various stimuli. Often the choice between observing and questioning is a matter of con­ venience and feasibility. B u t sometimes the types of data that may be ob­ tained b y observing or questioning are very different. For example, people's answers to questions about h o w happy they are constitute one possible proxy for happiness. A n d observed rates of suicide and l y n c h i n g constitute another possible proxy for happiness (actually for the opposite of happi­ ness). B u t the concepts of happiness for w h i c h the proxies stand m i g h t be considerably different. This book gives b u t little treatment to questionnaire surveys (page 195). 1. E. Webb et al. have collected a great many of what they call "unobtrusive measures" or "oddball measures" of human behavior.

Surveys: Pro, Con, and How to Do Them

195

There are several reasons for w h a t may seem a cavalier neglect of the technique that constitutes so m u c h of research i n sociology and market research and social psychology. For one, the questionnaire survey is very w e l l covered i n an extensive literature. Furthermore, a special treatment of questionnaire surveys leads readers to think that questionnaire research has very special properties that make i t entirely different from other kinds of research—which is not so. Also, I w a n t to decrease the likelihood that stu­ dents w i l l rush b l i n d l y to use the questionnaire survey. I t is not that 1 w a n t to discourage its use w h e n i t really is appropriate, b u t too often people do questionnaire research just because they do not realize that there may i n ­ deed be m u c h better methods for getting the knowledge they want. 6 ) Another w a y of classifying surveys is by the several types of informa­ tion that they can obtain—to p u t i t another way, b y the several sorts of purposes they may achieve. A n y given survey may have more than one purpose and may therefore obtain more than one sort of information; most descriptive surveys do. W e shall now consider these sorts of information one by one. A survey may obtain such demographic data as population, age, weight, income, and so forth. The U.S. Census is the major illustration of a demo­ graphic-data survey. B u t most other surveys collect some demographic data also, often for purposes of cross-classification to establish different patterns of behavior and attitudes for different groups of people. Demographic data have been collected as long as there have been govern­ ments; information about p o p u l a t i o n and property has always been impor­ tant to rulers so that they could levy and collect taxes. A . Toynbee (the uncle) quoted this interesting speech about a proposed English census survey, delivered i n Parliament i n 1753, by M r . T h o r n t o n , M e m b e r for the C i t y of York: I did not believe that there was any set of men, or indeed any individual of the human species, so presumptuous and so abandoned as to make the proposal we have just heard. . . . I hold this project [a census of population] to be totally subversive of the last remains of English liberty. . . . The new bill will direct the imposition of new taxes, and indeed the addition of a very few words will make it the most effective engine of rapacity and oppression which was ever used against an injured people. . . . Luckily this dire prediction has not come to pass. Moreover, an annual register of our people will acquaint our enemies abroad with our weakness. (Toynbee, pp. 7, 127) Information about people's behavior may also be obtained b y surveys. Knowledge of behavior is the m a i n subject and the final goal of m u c h of the behavioral sciences and all of economics; therefore, behavior surveys are of obvious use. Furthermore, information about behavior may also be of i n ­ terest to us i f we are interested i n w h a t people think, because we can often infer attitudes and beliefs from people's behavior. The behaviorist psycholo­ gists go very far i n saying that n o t h i n g except behavior can be meaningful

196

Research Decisions and Procedures

data i n psychology ( a n d speech is therefore k n o w n as "verbal b e h a v i o r " ) . This matter is discussed i n a brief note at the end of the chapter. People's intentions about future behavior can sometimes be ascertained by asking them w h a t they p l a n to do i n the future. I m p o r t a n t economic data about consumers' intentions to b u y durable goods and about businessmen's intentions to invest i n plant and equipment are regularly obtained by surveys of intentions. Naturally enough, people do not always do w h a t they have earher said they intended to do. H o w w e l l the intentions jibe w i t h the behavior depends upon such factors as the length of time between the survey and the behavior and whether there are unusual occurrences ( l i k e a recession) i n between. To some extent the errors wash out; some people w h o intended do not do, and some w h o d i d not i n t e n d do. The amount of information that people have about various aspects of the w o r l d is sometimes the subject of surveys. For government- or businesspolicy purposes, i t may be i m p o r t a n t to k n o w h o w many people k n o w that a new recreation area has been opened or that all males must register for the draft or that the income tax has been changed or that Argonaut is n o w president of the U n i t e d States. Gallup polls often seek this k i n d of informa­ tion. Opinions, attitudes, beUefs, and interests are often the subject of surveys. I l u m p all these categories together and call them thoughts or mental con­ tents because the type of method that is used is quite similar for all of them. This is tricky knowledge to obtain because practically every obstacle enumerated i n earlier chapters crops up. As I have noted, sometimes we survey behavior as a means of inferring w h a t people think. The converse is also true; sometimes we survey people's thoughts i n order to learn something about their behavior. The justification given b y many social psychologists for surveying attitudes about race rela­ tions is that we can infer something about people's behavior from their attitudes. Such inference is hazardous at best, as advertising research tells us w i t h great authority; there is a very tenuous relationship between w h a t people say about a product and their actual purchasing. There must be some relationship between the contents of people's minds and w h a t they do, b u t the relationship is very often not straightforward. Part of the trouble is that there is no one-to-one relationship between w h a t is i n someone's m i n d and w h a t he tells an interviewer. The reasons people give for their actions are still another subject of sur­ veys: the surveys that ask "why?" W h y d i d y o u b u y a F o r d this year? W h y d i d y o u not go fishing i n your neighbor's canoe? W h y d i d y o u vote for Carter? M a n y surveys of other kinds are also intended to discover the causes of h u m a n behavior, b u t i n some cases the simple question "why?" can unravel a complex matter w i t h dispatch. This is especially true w h e n one is i n q u i r i n g into rational actions that are under conscious control, like asking a professor w h y she gives Jones " F " and Smith " A . " The question " w h y " is also very

Surveys: Pro, Con, and How to Do Them

197

effective w h e n one is t r y i n g to find out a set of social rules; for example, i f you ask a Japanese w h y he takes off his shoes w h e n he visits a house, the answer w i l l probably be useful and accurate. B u t w h e n investigating m o t i ­ vations that are less rational and that depend u p o n a persons tastes, loyalties, and education, " w h y " questions are not so likely to produce useful answers. For example, t h i n k h o w confusing i t w o u l d be i f someone asked you w h y y o u bought the car you d i d . T o start w i t h , y o u w o u l d not k n o w whether y o u were being asked w h y y o u bought any car or w h y y o u bought this particular make of auto. A n d i f the latter, y o u w o u l d have difficulty i n conveying just w h y y o u bought a F o r d . For another example, refer back to the discussion of M . Haire's study of w h y w o m e n d i d not b u y instant coffee. The simple question "why?" i n that case just could not do the job. 7) Questionnaire surveys are classified b y whether they are done by mail, by telephone, or by personal interviewing. The mail survey is generally cheapest, though sometimes telephone i n t e r v i e w i n g w i t h i n a local area can rival i t for cost. The m a i n disadvantage of the m a i l survey is the difficulty of obtaining a satisfactory random sample because some people do not r e t u r n the questionnaires. Furthermore, those people may w e l l be very different and m i g h t give different answers from those people w h o do respond. ( N o n response bias is discussed on p. 316. ) Sometimes i t is possible to increase the response b y m a i l i n g repeated questionnaires to people. A n d occasionally the types of information that y o u seek w i l l not be i n danger of bias from nonresponse; a t r i v i a l example w o u l d be a p o l l of a professional association s membership about whether the convention should be i n N e w York or Balti­ more; there is no reason to believe that the nonresponders w o u l d have different preferences from those of the people w h o do respond. B u t usually you must investigate the extent and nature of the bias w i t h auxiliary tech­ niques—perhaps w i t h telephone or personal interviews of a sample of the nonresponders—so that y o u can allow for the bias b y adjusting the results of the m a i l sample. The rate of response to a m a i l questionnaire depends very m u c h on w h o is sending i t out, its subject, w h o receives i t , and h o w easy i t is to answer. T h e response rate can also be influenced b y the cover letter sent w i t h the ques­ tionnaire and b y the inducements offered. Earlier I reported that a 3-cent ballpoint pen d o u b l e d responses i n a l i b r a r y study (Fussier & S i m o n ) . Another report, slightly unbelievable, indicates that 75 percent of a ballpointpen group responded, whereas the control-group response was 18 percent ( K l e i n ) . Sometimes i t is w o r t h w h i l e to purchase people's answers b y sending a nickel, dime, quarter, or even dollar w i t h each questionnaire. T h e U.S. Bureau of the Census has employed m a i l questionnaires as the p r i m a r y m e t h o d of collecting data on population and housing since the 1970 Census. A test produced a response rate of 83-89 percent for the first mail­ ing, and the Bureau hopes to do better i n the future. The biggest p r o b l e m is creating an accurate m a i l i n g list ( Cohen, p . 22 ) . Telephone i n t e r v i e w i n g can be a remarkably efficient survey method. U n -

198

Research Decisions and Procedures

t i l recently i t was used only locally because of the cost of long-distance calls, t h o u g h i n some cases long-distance telephone i n t e r v i e w i n g was efficient. Recently the phone companies have made available various flat-rate plans under w h i c h u n l i m i t e d long-distance calling is possible w i t h i n w i d e areas; i n some plans the state is the l i m i t , i n others the entire U n i t e d States. These deals are i n the process of g i v i n g telephone i n t e r v i e w i n g a vast new impor­ tance, I think. Telephone i n t e r v i e w i n g has been too little used i n the past to replace personal interviewing, b u t n o w i t must come into its o w n . There is little difficulty w i t h nonresponse i n telephone interviewing, and therefore the sample obtained from a telephone survey is sufficiently ran­ d o m for many purposes. Furthermore, the sample can be taken sequentially; y o u just keep m a k i n g more calls u n t i l your sample is b i g enough, w h i c h relieves y o u of having to decide i n advance h o w large i t must be. One snag is that the telephone book is not a very random sample; i t excludes unlisted phones and people w h o have m o v e d recently, w h i c h together total about 20 percent of the total phones i n service (Cooper, p . 45, Classen and Metzger, p . 6 0 ) . B u t random-dialing techniques have been developed that avoid this difficulty, w h i c h were discussed on page 128. The m a i n disadvantages of the telephone interview are these: First, the interview must usually be short; unless y o u prearrange the interview, i t is seldom practical to ask more than a handful of questions. Second, y o u cannot observe the subject visually; i n personal interviews observation can reduce l y i n g . O n the other hand, people may not lie or exaggerate as m u c h i n a phone interview as i n a personal interview because they are not as personally involved w i t h the telephone interviewer. T h i r d , some people do not have phones, b u t i n some areas of the U n i t e d States such h i g h propor­ tions of people have phones that the worst possible bias from this source cannot be very dangerous. Personal interviews suffer only from the disadvantage of h i g h cost i n money and time. A t 1968 costs, personal interviews of a sample of people chosen r a n d o m l y throughout the U n i t e d States may cost $25 or more per interview, even w h e n carried out by organizations that already have staff facilities set up for i n t e r v i e w i n g . I f y o u had to start from scratch, the cost w o u l d be even higher. Local interviews can be m u c h cheaper, of course. B u t personal interviews have some i m p o r t a n t advantages over m a i l and phone interviews, w h i c h is w h y they are used despite the h i g h costs. T h e i n t e r v i e w can be long, sometimes several hours; people often enjoy being interviewed. The interviewer can check some information w i t h his o w n eyes, w h i c h may reduce exaggeration; for example, not many people w h o live i n shacks dare report h i g h incomes to interviewers i n person. ( O n the other hand, the personal relationship w i t h the interviewer leads some people to w a n t to impress the interviewer.) Another major advantage of the personal interview is that the interviewer can probe for further information, b y ask­ i n g " W h a t do y o u mean b y that?" and so on, and he can also explain questions that the subject cannot understand; he can even translate into a foreign language for a subject w h o does not speak English.

Surveys: Pro, Con, and How to Do Them

199

T h e choice among m a i l , telephone, and personal interviews is delicate and calls for good j u d g m e n t on the p a r t of the researcher. Y o u must consider a l l the advantages and disadvantages of the various techniques i n T a b l e 13.1) as they apphj to your particular

research

project.

(summarized T h e t r i c k is

to balance the advantages against the disadvantages to arrive at the best possible technique for the expenditure of t i m e and money. TABLE 13.1

Relative Merits of Principal Methods of Data Collection

Personal Interview

Mail

Telephone

Advantages Most flexible means of obtaining data Identity of respondent known Nonresponse generally very low Distribution of sample controllable in all re­ spects

Wider and more representative distribution of sample pos­ sible No field staff Cost per questionnaire rela­ tively low People may be more frank on certain issues, e.g., sex No interviewer bias; answers in respondent's own words Respondent can answer at his leisure, has time to "think things over" Certain segments of population more easily approachable

Representative and wider distribution of sample possible No field staff Cost per response rela­ tively low Control over interviewer bias easier; supervisor present essentially at interview Quick way of obtaining information Nonresponse generally very low Callbacks simple and economical

Disadvantages Likely to be most expen­ sive of all Headaches of interviewar supervision and control Dangers of interviewer bias and cheating

Bias due to nonresponse often indeterminate Control over questionnaire may be lost Interpretation of omissions difficult Cost per return may be high if nonresponse very large Certain questions, such as ex­ tensive probes, can not be asked Only those interested in sub­ ject may reply Not always clear who replies Certain segments of population not approachable, e.g., illiterates Likely to be slowest of all

Interview period not likely to exceed five minutes Questions must be short and to the point; probes difficult to handle Certain types of ques­ tions can not be used, e.g., thematic apper­ ception Nontelephone owners as well as those without listed numbers can not be reached

SOURCE: Research Methods in Economics and Business, p. 210, by Robert Ferber a n d P. J . V e r d o o r n ; © T h e M a c m i l l a n C o m p a n y 1962; reprinted by p e r m i s s i o n of T h e M a c m i l l a n Company.

200

Research Decisions and Procedures

5. T h e Steps i n E x e c u t i n g a S u r v e y Step 1. F o l l o w the procedures outHned i n Chapter 7, to the point at w h i c h you are ready to decide on a method. Step 2. F i n d out i f the data y o u w a n t to collect already exist i n some published study or i n one of the "data banks" of some major university research organizations. I t is a shame to find out too late that y o u c o u l d have saved hours and dollars w i t h a simple phone call or a letter. Sometimes the existing data are not exactly w h a t you want; for example, the data published by the I n t e r n a l Revenue Service do not contain fine breakdowns that w o u l d often be valuable, and people and firms are classified i n categories that may not be ideal for your purposes. Nevertheless, i t may be possible to adjust and interpolate the data to make i t y i e l d most of the information y o u want. Step 3. F r o m here on, i t is assumed that y o u have not been able to find the data y o u w a n t i n existing records b u t must collect the r a w data yourself. Your next step is to define the universe that y o u w a n t to sample or poll. The universe y o u w o r k w i t h m i g h t dictate whether y o u reach one conclusion or its opposite. For example, assume y o u w a n t to k n o w i f there is any relation­ ship between per capita income and radio ownership or between per capita income and the diets people choose. I f y o u take a sample i n a very homogeneous c o m m u n i t y like Park Forest, Illinois, y o u w i l l find practically no differences. The range of income is very small, everyone owns a radio, and most people eat steak, drink m i l k , and eschew caviar. But, i f the sample is taken i n Chicago, w h i c h is more heterogeneous and has a wider income range than Park Forest has, there w i l l be some people poor enough not to own radios, so that some relationship w i l l appear between income and radio ownership. There w i l l also appear to be some relationship between income and menu, for the lower-class black eats collard greens and side meat, and the h i l l b i l l y i m m i g r a n t eats w h a t she ate i n the Ozarks. I f the researcher wants to obtain a really h i g h correlation, a l l she has to do is take a w o r l d ­ w i d e sample. Practically no one of l o w income i n I n d i a has a radio. A similar example exists for d r i n k i n g behavior and rehgion. Just because the major religions i n the U n i t e d States do not differ significantly i n their stands on d r i n k i n g , one should not conclude that their point of v i e w is common i n all h u m a n groups. I f the universe includes the Moslem coun­ tries, the results w i l l be markedly different. Clearly the universe should be chosen for its relevance to the p r o b l e m y o u are interested i n and not for h o w strong a relationship i t w i l l yield. Step 4. Next, decide on the sampling procedure y o u are going to use. This step includes deciding w h a t list of people or other representation of the universe y o u w i l l select from, the physical procedure y o u w i l l use to select names ( f o r instance, systematically b y taking every n t h name or randomly

Surveys: Pro, Con, and How to Do Them

201

by selecting a r a n d o m page and then a random name or going to every n t h house), and then actually m a k i n g up a list of people ( o r objects) to be sampled. Step 5. T h i n k out and decide u p o n the procedure that y o u w i l l actually use i n observing or questioning your subjects. I f y o u are gathering data w i t h a questionnaire, preparation of the questionnaire is crucial. I f y o u are using an observation survey, y o u must decide w h a t pieces of behavior y o u w i l l w a t c h and count; this decision requires classification and definition. I f y o u are studying the first stair-climbing behavior of babies, y o u must define w h a t w i l l count as c l i m b i n g motions. I f y o u have already prepared d u m m y tables, as we suggested on page 111, you w i l l find them an enormous aid i n the construction of the questionnaire or observation procedure. Step 6. Test the questionnaire or the observation procedure i n a p r e l i m i ­ nary r u n . Look for such things as confusion on the part of respondents about w h a t questions mean and i n a b i l i t y to answer w i t h i n the categories you have provided. Then revise the questionnaire or observation procedure. Test again and so on u n t i l your procedure does the job i t is supposed to do. I t pays to iron out all possible diflSculties at this stage rather than later. Step 7.

Collect your data.

Step 8. Analyze your data. This subject is covered i n Chapter 25. Statistical analysis is usually a larger part of survey research—especially that seeking relationships—than of experimentation because the researcher must examine the effect of the variables analytically rather than b y varying them physi­ cally as i n an experiment. Step 9. Interpret cover this w o r k .

the data, and d r a w sound conclusions.

Chapters 26-32

6. S u m m a r y A survey gathers data about a population and its characteristics. I t observes the p o p u l a t i o n as i t exists, w i t h o u t altering i t experimentally. The m a i n advantage of a survey over a laboratory experiment is its real­ ism; and over a field experiment i t may have a cost advantage. Its m a i n disadvantages are: a) its difficulty i n establishing that t w o groups being compared are really similar, and b ) the difficulty of clarifying causation. Surveys can gather data b y observing or questioning; the data can be demographic characteristics or behavior or intentions or attitudes. Surveys can be done b y mail, telephone, or i n person; each m e t h o d has advantages and drawbacks. The steps i n executing a survey are listed i n the chapter.

202

Research Decisions and

Procedures

EXERCISES 1 . G i v e e x a m p l e s in w h i c h a. t h e h i g h e r c o s t of p e r s o n a l i n t e r v i e w i n g is w a r r a n t e d b. l o w e r - c o s t m a i l o r t e l e p h o n e i n t e r v i e w s m a k e m o r e personal interviews c. d. e. f.

sense than

do

a n e x p e r i m e n t w o u l d be c h e a p e r t h a n a s u r v e y a survey w o u l d be c h e a p e r than an e x p e r i m e n t o b s e r v a t i o n s u r v e y i n g is p r e f e r a b l e t o i n t e r v i e w s u r v e y i n g i n t e r v i e w s u r v e y i n g is p r e f e r a b l e t o o b s e r v a t i o n s u r v e y i n g

2. G i v e e x a m p l e s of s u r v e y s t h a t s t u d y a. b e h a v i o r b. i n t e n t i o n s c. p e o p l e ' s i n f o r m a t i o n d. attitudes, opinions, beliefs, or interests e. r e a s o n s g i v e n f o r b e h a v i o r

ADDITIONAL

READING

FOR

CHAPTER

13

B a b b i e is a r e a d a b l e g e n e r a l b o o k o n s u r v e y r e s e a r c h m e t h o d s . S t e p h a n a n d M c C a r t h y ( C h a p t e r s 1 2 - 2 2 ) c o n t a i n a w e a l t h of u s e f u l i d e a s o n t h e e x e c u ­ t i o n of s a m p l e s u r v e y s . H y m a n (1955) is a n e x c e l l e n t g e n e r a l w o r k o n s u r v e y s . A n d S u d m a n (1967) h e l p s y o u e x e c u t e y o u r s u r v e y r e l a t i v e l y c h e a p l y ; o r t o p u t it a n o t h e r w a y , he h e l p s y o u g e t m o r e i n f o r m a t i o n f o r a given expenditure. O n i n t e r v i e w i n g , s e e H y m a n ( 1 9 5 4 ) ; C a n t r i l ( C h a p t e r s 1-4, e s p e c i a l l y 1 - 2 ) ; P a r t e n ; P a y n e , a n d a l m o s t e v e r y t e x t b o o k in s o c i a l - s c i e n t i f i c a n d m a r k e t r e s e a r c h . P a u l g i v e s a n e x c e l l e n t d i s c u s s i o n of a n t h r o p o l o g i c a l i n t e r v i e w ­ i n g . A n a r t i c l e t h a t m a n y h a v e f o u n d u s e f u l in e v a l u a t i n g t h e s t r e n g t h s a n d l i m i t a t i o n s of f i e l d s t u d i e s is t h a t of Z e l d i t c h . F o r d i s c u s s i o n of " r e a s o n s w h y " a n a l y s i s , s e e Z e i s e l ( C h a p t e r s 6, 7, 8 ) ; L o r i e a n d R o b e r t s ( C h a p t e r 1 7 ) ; a n d P. F. L a z a r s f e l d (1935). S u d m a n (1967) p r o v i d e s a lot of u s e f u l i n f o r m a t i o n o n h o w t o r e d u c e t h e cost of surveys.

14 s o m e o t h e r qualitative a n d quantitative tecliniques 1. 2. 3. 4. 5. 6. 7.

Deductive Reasoning The Case Study Participant Observation Expert Opinion Content Analysis Simulation Summary

Research methods cannot be satisfactorily classified i n a one-dimensional, m u t u a l l y exclusive scheme. Rather, there are several ways that one may classify the technique actually used i n any study. For example, the previous chapter examined the difference between the experiment and the survey, and Chapter 12 discusses the difference between the long view and the w i d e view; a particular survey may use the long-view or the w i d e - v i e w technique, and an experiment can too. Later i n this chapter we shall discuss content analysis. Content analysis can be used i n either a survey or an experiment, though i t is more likely to be used i n a survey. This chapter covers a few techniques for obtaining knowledge that are not described elsewhere i n the book. T h e first group includes the qualitative methods of deduction and case study—including the psychological depth study—which are neither surveys nor experimental methods. T h e n w e dis­ cuss use of expert o p i n i o n as a m e t h o d of obtaining knowledge. Content analysis, w h i c h is a quantitative method of w o r k i n g w i t h "qualitative" data, is another topic i n the chapter.

1. D e d u c t i v e R e a s o n i n g The principle of the deductive m e t h o d for obtaining knowledge is that, if A is true and if B is true, then under some specified conditions one can safely

204

Research Decisions and Procedures

say that C is true. This is the simplest type of deduction, of course. The chains of reasoning can be longer, more complex, and probabilistic, b u t the same p r i n c i p l e holds. Consider the example of w a n t i n g to k n o w whether i t is r a i n i n g outside. Your previous experience tells y o u that, i f people are h o l d i n g open umbrel­ las over their heads, i t is raining; this observation is premise A . Premise B is w h a t y o u actually see from your w i n d o w at a given moment—that almost everyone w h o has an umbrella w i t h h i m has i t open. You deduce your conclusion C, that i t is indeed r a i n i n g outside. Notice that whether your conclusion C is correct depends upon the cor­ rectness ( t r u t h ) of your premises A and B. I f y o u are i n a country where people h o l d umbrellas over them to keep the sun off, then premise A is w r o n g i n that case, and your conclusion C w i l l p r o b a b l y be i n error. Simi­ larly, i f your eyes deceive you about whether people are h o l d i n g open umbrellas, then premise B is incorrect, and your conclusion w i l l probably be incorrect too. I n one sense, there is n o t h i n g "new" i n the conclusion C; all the informa­ tion contained i n the conclusion is already contained i n the premises. Never­ theless, deduction helps us to k n o w and understand the w o r l d about us because i t opens our eyes to information that we w o u l d otherwise not under­ stand, just as a c h i l d learns w h e n an adult points out to her h o w a bicycle works. D e d u c t i o n is a device for discovery of the truths that lie concealed w i t h i n a set of statements. ( I n this sense, i t is like all mathematical analysis.) I n the case of the umbrellas and the rain, the deductive method m i g h t or m i g h t not be more effective than an empirical investigation. The empirical m e t h o d of going outside and extending an u p t u r n e d p a l m has the advan­ tage of rendering y o u safe from most false premises. ( B u t notice that the premise that water coming d o w n from above is rain might be incorrect; someone m i g h t be spraying a hose out a w i n d o w . I n some sense all k n o w l ­ edge is deduced, and all knowledge depends upon various premises.) The advantage of the deductive method i n this case is that you do not have to go outside and get w e t to obtain an answer. W e use deduction all the time as a helpful device. You m i g h t have followed this line of thought this m o r n i n g : Classes meet M o n d a y to F r i d a y ; today is Monday; therefore classes meet today. A n d off you go to class on the basis of your deduction, w i t h o u t telephoning to check that classes are indeed meeting. Notice that I call the deductive method a "method of getting knoivledge" i n the same w a y that I have talked about various empirical methods of getting knowledge. I n one sense empirically established facts have more claim to be called "knowledge" than does a conclusion arrived at deduc­ tively, because whenever a deduction and an empirically estabHshed fact collide, the deduction must give w a y : The empirical demonstration is the ultimate test. As someone has said, many a beautiful theory has been slain

Some Other QuaUtative and Quantitative Techniques

205

by an ugly fact. There is a famous example of a W o r l d W a r I I navy airplane that theoretically w o u l d not fly at all. W h e n a model was made and flown despite the theory to the contrary, the theory had to be revised, for i t was i n error. W h e n y o u k n o w that the plane has flown millions of miles, y o u can afford to disregard the theory telling you that i t cannot fly. The contest between deduction and empirical knowledge is not always so easily settled, however. Often the empirical fact is not so clear-cut because the e m p i r i c a l measurement is uncertain, and i n that case a strong deductive argument may be more persuasive. The issue of whether t r a d i n g stamps raise supermarket food prices is an example. Several empirical studies have apparently shown that t r a d i n g stamps do not raise prices (Beem; U.S. Department of A g r i c u l t u r e ; B u n n ) . O n the other hand, strong deductive economic theory argues that t r a d i n g stamps must raise prices i n the long run. ( O f course t r a d i n g stamps raise food prices. They make i t more difficult for consumers to compare food prices of i n d i v i d u a l items and of stores as a whole, and this lack of knowledge b y consumers dulls the incentive of merchants to set prices to a fine edge of competition. Furthermore, stamps raise the total out-of-pocket costs to the food store; Davis.) A n d no contrary deduction using other assumptions comes to m i n d , except possibly that stamp-giving stores reduce their advertising. Furthermore, a reanalysis of the same U.S. Department of Agriculture data that went into the empirical analyses contradicts the original findings (Strotz; b u t see Beem, 1959). I am therefore inclined to place most credence i n the logical deduction and be­ lieve that, at least i n the long r u n , t r a d i n g stamps do raise prices. This example demonstrates that empirical knowledge is not always more conclu­ sive than deductive knowledge or even of a basically different nature. The effect of population g r o w t h upon economic development is another matter i n the study of w h i c h we p u t more stock i n deduction than i n empirical evidence. R. Easterlin sums u p the latter: " O n the whole, then, simple empirical comparisons between economic and population g r o w t h rates are inconclusive. Cases [nations] of high per capita income g r o w t h are associated w i t h b o t h high and l o w per capita income g r o w t h " (Easterlin, p. 107). B u t the empirical analyses are extremely weak, for many reasons. A n d the theoretical reasons to believe that increased population causes lower income are very strong. Therefore, most scientists advocate population con­ t r o l as a means of speeding economic development, and many nations follow this recommendation—on the basis of deduction and w i t h o u t empirical support. Note that deduction is not the same t h i n g as theory. Theory is an inter­ laced body of existing systematic knowledge, and i t is important just be­ cause i t is a source of premises for deduction (see the discussion i n Chap­ ter 3 ) . W e may end this section on deduction w i t h the usual admonition: D o not ignore deduction w h e n i t is useful, b u t do not l i m i t yourself to its use and neglect other methods. O n one hand, remember that deduction m i g h t give

206

Research Decisions and Procedures

you all the information you need about the effect of trading stamps on prices; on the other hand, do not refuse to look into the horse's m o u t h to find out h o w many teeth he has. 2. T h e Case S t u d y T h e case study is almost synonymous w i t h the descriptive type of research discussed i n Chapter 4. I t is the method of choice w h e n you w a n t to obtain a w e a l t h of detail about your subject. You are likely to want such detail w h e n you do not k n o w exactly what you are looking for. The case study is therefore appropriate w h e n you are t r y i n g to find clues and ideas for further research; i n this respect, i t serves a purpose similar to the clue-providing function of expert opinion.^ The specific method of the case study depends upon the mother w i t , common sense, and imagination of the person doing the case study. The investigator makes up his procedure as he goes along, because he purposely refuses to w o r k w i t h i n any set categories or classifications; i f he d i d so, he w o u l d not be obtaining the benefits of the case study. These admonitions may be useful: First, w o r k objectively. Describe what is really out i n the w o r l d and w^hat could be seen by another observer. A v o i d filtering w h a t y o u see through the subjective lenses of your o w n personality. Second, con­ stantly reassess w h a t is i m p o r t a n t and w h a t is unimportant. F o l l o w up and record w h a t seems most important. Constantly exercise your judgment on this issue. T h i r d , w o r k long and hard. Saturate yourself i n the situation, and keep at i t . Some anthropologists believe that case studies of less than several years' duration are likely to be misleadingly superficial ( I faring, p. 5 3 ) . B. M a l i n o w s k i gives a v i v i d argument on this p o i n t : Living in the village with no other business but to follow native life, one sees the customs, ceremonies and transactions over and over again, one has examples of their beliefs as they are actually lived through, and the full body and blood of actual native life fills out soon the skeleton of abstract constructions. That is the reason why, working under such conditions as previouslv described, the Ethnog­ rapher is enabled to add something essential to the bare outline of tribal con­ stitution, and to supplement it by all the details of behaviour, setting and small incident. He is able in each case to state whether an act is public or private; how a public assembly behaves, and what it looks like; he can judge whether an event is ordinaiy or an exciting and singular one; whether natives bring to it a great deal of sincere and earnest spirit, or perform it in fun; whether they do it in a perfunctory manner, or with zeal and deliberation. I n other words, there is a series of phenomena of great importance which cannot possibly be recorded by questioning or computing documents, but have to be observed in their full actuality. Let us call them the inponderahiha of actual life. (Malinowski, p. 18) 1. See R. Cyert, et al. for an interesting example of the case method used in a preliminary exploration of decision making i n large organizations.

Some Other QuaUtative and Quantitative Techniques

207

Sometimes the investigator must choose whether she should carry out one or a few case studies rather than adopting the alternative of d o i n g a survey or an experiment on a larger sample of people. A n interesting example is found i n market research, i n w h i c h "motivation research" has competed w i t h survey methods of investigating the reasons people buy. ( F o r the moment we shall apply the label "motivation research" to the psychological case study " i n depth" i n w h i c h a clinical psychologist spends many hours or days " p r o b i n g " a single person. Sometimes, however, survey methods or experi­ ments are also called "motivation research," w h e n they seek the reasons people b u y . ) Motivation-research case-study methods were able to produce such insights as that drivers judge auto acceleration b y the stiffness of the spring of the accelerator pedal and that men d i d not w a n t to fly on business trips because of their sense of guilt about d y i n g and leaving their families husbandless and fatherless. O n the other hand, an experimental m e t h o d was able to estabhsh that w o m e n d i d not b u y instant coffee because they thought of instant-coffee users as lazy homemakers ( H a i r e , discussed i n Chapter 6 ) . The difference between the t w o methods is that the insights into feelings about auto acceleration and guilt about flying were ideas produced b y the case study b u t not tested and p r o v e d b y i t . O n the other hand, the idea about the cause of women's not b u y i n g instant coffee was not produced b y the experiment; the experimenter had to have the idea to start w i t h ; he probably got i t by introspection or b y crude informal case study. B u t the experiment d i d test the idea and prove i t . Actually, the case study and the survey or experiment are not alternatives b u t are complementary; they should be used together.

3. P a r t i c i p a n t O b s e r v a t i o n I f y o u wish to understand the f u l l complexity of a case situation i n social science, y o u may have no alternative b u t to get yourself i n v o l v e d as a person. A n example is that of a psychologist such as F r e u d w h o wishes to p l u m b the depths of another person's m i n d and can only do so b y interact­ ing w i t h the person i n a h u m a n relationship. Another example is that of a w h i t e man w h o wishes to understand black life; Griffith d y e d his skin black and traveled as a black i n the South. After the participant-observation has been done and key elements of the situation have been identified, research methods that are less dependent on the researcher's personality often can be employed, just as systematic tests of Freud's ideas were made to check on Freud's observations. W h e n studying a complex group situation, such as MaHnowski described on page 295, the researcher is especially likely to conclude that personal involvement is the appropriate strategy. I n the words of one experienced participant-observer, i n order to understand the richness of i n d i v i d u a l and group relationships, one aims to "record the ongoing experiences of those

208

Research Decisions and Procedures

observed." A n d to do so, one must "adopt the perspective of those studied b y sharing i n their day-to-day experiences" ( Denzin, p. 185 ) . M o r e generally, the participant-observer's strategy is to immerse oneself i n a l l aspects of the situation by using all available sources of i n f o r m a t i o n informal talks w i t h members of the group one is studying, reading letters and other documents, passively observing and listening w h i l e simply "hang­ i n g around" w i t h the group, and so on. The key difference from other research methods is that the participant-observer participates with and has some sort of role i n the group, rather than m m n t a i n i n g a distance between the observer and the observed? The great nineteenth-century economist A l f r e d Marshall said this as w e l l as i t has been said: [T]he method of le Play's monumental Les Ouvriers Européens is the inten­ sive study of all the details of the domestic life of a few carefully chosen families. To work it well requires a rare combination of judgment in selecting cases, and of insight and sympathy in interpreting them. A t its best, it is the best of all: but in ordinary hands it is likely to suggest more untrustworthy general conclusions, than those obtained by the extensive method of collecting more rapidly very numerous observations, reducing them as far as possible to statistical form, and obtaining broad averages in which inaccuracies and idiosyncrasies may be trusted to counteract one another to some extent. (Marshall, p. 116) I t is difficult to observe w e l l w h i l e p a r t i c i p a t i n g i n a manner that w i l l aid observation. "Carving out a role" i n the group is difficult and treacherous from a scientific standpoint ( Denzin, p. 188). There are various possibili­ ties: For example, one can hide one's purpose and pose as a regular member of the group; this raises ethical questions and creates technical difficulties for recording information. Or one can explain one's purpose and t r y to enlist cooperation. Other difficult choices must also be made according to the exigencies of the situation, such as whether to t r y to use the same vocabu­ lary as the people w h o m one is observing. A n d i n one case, h o w to get oneself released from the insane asylum to w h i c h one has gotten oneself c o m m i t t e d ( Rosenham ) . Great w i s d o m and skill are necessary for success as a participant-ob­ server. B u t the rewards i n ideas and understanding of people and groups can be very great i f one succeeds. 4. E x p e r t O p i n i o n By "expert o p i n i o n " I mean the judgments and estimates made b y people w h o have spent m u c h of their time w o r k i n g w i t h a particular subject and w h o have gathered m u c h general information that has been filtered through their minds and stored i n their memories. A t the start, let us distinguish between the use of expert opinion as a source of general guidance and clues for getting started i n the right direc­ t i o n on a particular research topic and its use as the final data on w h i c h y o u base your conclusion. T o mention a b a d example of the first use, I once

Some Other Qualitative and Quantitative Techniques FIGURE

209

14.1

OUR BOARDING HOUSE NOW TELL U5. PROF. PEWEV ADDLETON N I T IN VOUR OWN w o R P ^ , HOW y o u PENETRATEP A M I P P L E CLASS HOME A N P MAPE RE6EAf?CH HI5T0RV WITH WHAT WE M I 6 K T REFER f O AS-^HA-HA \ N I T PICKINÖ-.'

Source: From the Champaign-Urbana Courier, mission of Newspaper Enterprise Association.

with Major Hoopte I CAN 5EE; YOU'RE m YCUR U^UAL FORM tONlöHT FLANNELMCUTH

November 5, 1975, p. 9. Reprinted by per­

began a study of race horses w i t h o u t sohciting the "expert" opinion of habitual bettors on w h i c h variables to study, and the study suffered; on the other hand, i t w o u l d be total folly to accept expert opinion as your final scientific conclusion on h o w to predict w h i c h horse w i l l w i n a race. As a positive example of expert opinion as the basis for a conclusion rather than as a source of values, a court often relies u p o n the judgment of psychiatrists on whether a person is insane ( a l t h o u g h i n perhaps the most i m p o r t a n t cases the j u r y has the final say). A n d , for some purposes of social science, one m i g h t accept a psychiatrist's judgment that more patients recover from schizophrenia than from depression. Expert opinion can often be useful as a source of objective information that m i g h t be more difficult to collect b y other techniques; asking psychia­ trists about the recovery rates from schizophrenia and depression is an example. A more crucial use of expert opinion, however, is for judgments that require examination of an entire context, that is, t a k i n g into account an ill-defined total picture rather than a l i m i t e d number of well-defined factors.

210

Research Decisions and Procedures

A n example is asking a psychologist to judge whether a patient is neurotic on the basis of his entire profile on some psychological test battery rather than on the basis of a single score of some type. B u t L . Goldberg, among others, has shown that simple statistical rules of t h u m b can make such predictions about as w e l l as the experts. Expert opinion is indispensable w h e n the judgment involves human values. For one example, i n the library study our ultimate interest was i n creating a scheme that w o u l d keep the most valuable books i n the central library (Fussier and S i m o n ) . W e developed statistical rules of t h u m b to make the judgments, b u t the ultimate judgment of value h a d to be made b y noted scholars i n various fields; w h a t they judge as valuable is valuable, b y definition. Therefore, the best rule of t h u m b was one that agreed most closely w i t h the expert judges. Another example is scholars' ratings of u n i ­ versity graduate departments ( C a r t t e r ) . The scholars' opinions do not simply stand for some more objective measure of the quality of a graduate school; their judgments are the measure of values. I f one behoves that more objective data, like the amount of scholarlv w r i t i n g produced by a depart­ ment, provide the final test of a school's value, one should then collect such information rather than asking for experts' opinions. Expert o p i n i o n can be rendered b y a single expert—as is often the case i n the courtroom or w h e n a single adviser gives his opinions to policy makers— or many experts can be surveyed. The survey of graduate-school quality is an example of the latter; more than 100 scholars i n each field were asked their opinions, for a total of more than 400 scholars. I n most cases, a few opinions w i l l suffice, because, i f the experts are really experts, there w i l l be relatively little variation among their opinions; t w o w e l l - k n o w n economists w i l l surely be i n closer agreement on w h i c h are the best departments of economics i n the U n i t e d States than w i l l t w o laymen or t w o undergraduates or even t w o w e l l - k n o w n psychologists. T h e phrase "human yardstick" has been applied to studies i n w h i c h judges' ratings of psychological phenomena are the measurements w i t h w h i c h the researcher works. The agreement among judges is often m u c h higher than the judges themselves think i t is; they are often surprised at h o w little each one's judgment differs from those of the others ( U n d e r w o o d , p. 2 4 ) . I t is not always easy to draw a sharp d i v i d i n g line between w h a t is expert opinion and w h a t is first-hand data. I n the example of the psychiatrists asked about recovery rates from schizophrenia and depression, is a survey of psychoanalysts a survey of expert opinion or a survey of first-hand data? T h e analysts keep records, and a survey of such records—even informally t h r o u g h psychiatrists' judgments—is very m u c h like gathering first-hand data i n many other surveys. Here are some points of difference between expert opinion and more formal scientific investigation: First, scientific method generally tries to minimize the h u m a n judgment involved i n the data-gathering process; a

Some Other QuaUtative and Quantitative Techniques

211

meter reading is the ideal of scientific technique. But w h e n you gather expert opinion, there is m u c h room for h u m a n error to creep i n , i n the form of any of the obstacles to knowledge described i n Chapter 19. I n expertopinion studies there is more o p p o r t u n i t y for other people's minds to i n ­ tervene between the researcher and her subject matter than w i t h other techniques. Second, ordinary scientific studies can be closely replicated, because the researcher specifies exactly h o w he obtained his data. I n expert-opinion studies the researcher can tell you h o w and w h a t he asked the expert or experts, b u t he cannot tell y o u exactly h o w the experts gathered the infor­ mation that went into their judgments and estimates. There is not m u c h to say about h o w to gather expert opinion. This is the skill that journahsts possess, and i t is difficult to isolate the elements of the art. A l l of us spend much of our lives soliciting expert opinion and then deciding w h a t we shall and shall not regard as t r u s t w o r t h y . I think, there­ fore, that i t is not simply d u c k i n g the issue to tell y o u to use common sense i n an expert-opinion study and let i t go at that. A sample of experts is probably better than a single expert, and a random sample of experts may be useful. B u t mere quantity is worthless unless the experts are expert. The statements of a m i l l i o n people about the size of your feet w i l l be less accurate than the statement of the person w h o just sold y o u a pair of shoes. Scientists are w a r y of conclusions based on expert opinion and perhaps for good reason. A relevant saying of Samuel Johnson's was quoted a propos the university ratings: " A compendium of gossip is still gossip." Neverthe­ less, expert opinion w i l l always be an i m p o r t a n t source of knowledge i n science as w e l l as i n everyday l i v i n g .

5. C o n t e n t A n a l y s i s Content analysis is a technique that stands somewhere between the case study and the "open-ended question" i n a questionnaire survey. F r o m one point of view i t is reasonable to call content analysis a "qualita­ tive" technique, for the researcher does not make quantitative comparisons between t w o or more cases. A psychoanalyst may say that one of her pa­ tients is "more psychotic" than another patient, b u t she does not support that statement w i t h a number that counts something. I f you ask the psy­ choanalyst to prove her statement quantitatively, she may reply that y o u are asking her to measure the unmeasurable. B u t content analysis is actually a method of measuring the unmeasurable—at least to some extent—and from this point of view i t is sensible to call i t a "quantitative" technique. A t the other pole, consider the open-ended question. A questionnaire may ask " W h y d i d y o u leave your last job?" and various answers w i l l be given. The researcher then constructs a classification that may consist of such categories as "Pay too l o w , " "Bad w o r k i n g conditions," " D i d n ' t like the

212

Research Decisions and Procedures

boss," and so on. The researcher reads t h r o u g h each person's answer and "codes" i t b y deciding that i t should go into one or another of the categories. ( T h e original question could have been i n the form of a multiple-choice offering these various categories, b u t there are often sound reasons for giving the respondent freedom to say whatever he wishes, especially at the exploratory stage.) I n a sense, such coding is indeed actually measuring the unmeasurable and counting the uncountable, for i t converts qualitative answers to quantitative measurements. The content analyst sets up various classification schemes, w h i c h he then applies to speeches or writings. These classifications either count particular kinds of words or ideas, or they measure the amount of words or time that is devoted to particular ideas. E a r l y examples of formal content analysis were p r o v i d e d b y military-intelligence agencies i n w a r t i m e . Enemy newspapers ( a n d radio stations) were m o n i t o r e d exhaustively, and counts were made of various kinds of references to transportation, obituaries, and so forth. Varia­ tions i n the numbers of such references from week to week often signify troop movements or other changes that are clues to the intentions and actions of the enemy. Content analysis has been used extensively i n studies of the mass media to determine changes i n either the media themselves or i n society and cul­ ture as time passes. I t is a formalization of techniques that have long been used informally. For example, a researcher may count the number of favor­ able or unfavorable editorials i n a country's most i m p o r t a n t newspaper to see h o w the political climate has changed over time, rather than merely obtaining an informal impression of the political climate. Or he m i g h t count the number of overt references to sex i n popular magazines i n the V i c t o r i a n era, compared to the 1960s, to find out whether p u b l i c attitudes were indeed very hush-hush t o w a r d sex i n the V i c t o r i a n era. Or (a study I wish someone w o u l d d o ) he m i g h t study the popular press w i t h content analysis to see whether "conformity" has increased as the t w e n t i e t h century has proceeded. (As I suggested earlier, i t w o u l d be so hard to define "conformity" meaning­ fully that, not only w o u l d such a study probably be impossible, b u t also scientific statements about conformity w o u l d thereby be revealed as vacuous.) One of the more adventurous uses of content analysis is D . McClelland's study of the historical relationship between the motivation to achieve among the members of a society and the economic development of the society. H e and his associates have measured the frequency of "achievement imagery" i n the popular literature of the society at various periods and have related these frequencies to economic indicators. For example, Figure 14.2 shows the close correspondence ( w i t h a time lag of about fifty years) between the content-analysis data and coal imports into L o n d o n from 1550 to 1850. W h e n one considers the m u l t i t u d e of obstacles present i n such an investiga­ tion, the closeness of the correspondence is startling, and i t reassures us that empirical research i n social science need not be hamstrung b y the less-thanperfect conditions under w h i c h we must work.

213

Some Other QuaUtative and Quantitative Techniques

P. Sorokin used content analysis to analyze the grand c u l t u r a l changes over millennia. Figure 14.3 shows h o w the p r o p o r t i o n of philosophers of different outlooks has changed from century to century, as a proxy for the sway h e l d b y the various systems of t r u t h . The content of art may also be analyzed systematically, and this tech­ nique is the source of m u c h of our understanding of the contacts among cultures and the transmission of knowledge among them. A . Kroeber traces 14.2 Average n Achievement Levels in English Literature (1550-1800) Compared with Rates of Gain in Coal Imports at London 50 Years Later FIGURE

Achievement Scale

I

Coal Scale

I 1550

I 1600

I 1650

I 1700 Time Periods

\

I

1750

1800

I

I

1850

n Ach. = Mean number of achievement images per 100 lines. Coal = Coal imports at London as deviations from expected in standard-deviation units. Source: "Need for Achievement and English Industrial Growth," p. 19, by Norman M . Bradbum and David E. Berlew, in Economic Development and Cultural Change (October 1961); reprinted by permission of The University of Chicago Press.

the travels of the flying gallop ( a n i n v e n t i o n of artists, because horses do not run that w a y ) as a w a y of representing a r u n n i n g horse i n art: From the Ukraine, this Scythian style with the flying gallop spread to Hungary; to the Goths who at various times ranged between the Baltic and the Crimea; to the Caucasus and the Caspian Sea; and to southwestern Siberia where a related art maintained itself long after the Scythians were extinct, in fact until around A.D. 500. From this general region our device was communicated to Sassanian Persia (226-641); all earlier Persian art lacks the device, as did the Assyrian and Greek arts by which Persian art was influenced. A farther spread was to

214

Research Decisions and Procedures

China, where depiction of the flying gallop had become installed by terminal Han times, in the second post-Christian century. The Han dynasty repeatedly sought Western connections, especially in order to obtain heavy cavalry horses from Ferghana in modern Soviet Uzbekistan, so that an avenue was open for import of the stylistic influence.

FIGURE

14.3

Fluctuation of the Influence in Systems of Truth by Century Periods

Fluctuation of t h e Influence in Systems of Truth by Century Periods Per Cent

B.C. 600

400

200

^Empiricism Q

Rationalism

0 A.D. 200

400

600

800

1000

[U]]]Mysticism H

Criticism

1200

1400

1600

1800

1900

Skepticism JM^ Fideism

Source: Social and Cultural Dynamics, Vol. I I , p. 32, by Pitirim A. Sorokin; copyright, 1937, by The Bedminster Press; reprinted by permission of The Bedminster Press.

The Chinese, and following them the Japanese, adopted the flying gaflop in their art and have kept it to the present time. . . . I n 1794 it suddenly appeared in an English engraved print of a race horse by Stubbs, followed three years later by a woodcut in the Sporting Magazine. I t was about twenty-five years more before the new posture made its way into British high-art oil painting. (Kroeber, pp. 500-501) The most i m p o r t a n t decision i n content analysis involves the choices of categories, w h i c h must accurately represent the ideas or concepts that you w a n t to measure. Chapter 15 discusses the issue under the general r u b r i c of "classification."

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 3/18/2020 8:42 AM via ST MARYS UNIV main.ehost

Some Other Qualitative and Quantitative Techniques

215

Computer systems have been developed to speed and refine content analysis. The researcher punches the material onto cards and instructs the computer h o w to categorize key words. The computer, w h i c h is pro­ grammed to handle syntactical problems such as tense and number, then scans the material, and lists and counts the words i n each category ( Stone

et al). 6. Simulation^ "Simulation" is an ambiguous label that refers to several research tech­ niques. B y one meaning any laboratory experiment is a simulation. For example, H . Ebbinghaus' experiment w i t h the learning and retention of nonsense syllables can be called a simulation study of h u m a n learning processes. L a b o r a t o r y experiments w i t h physical models are also called simulation studies b y some writers. A n example is a laboratory w i n d - t u n n e l test of h o w model airplanes w i t h various characteristics behave under vari­ ous conditions. O r an engineer m i g h t w a n t to k n o w whether there is a chance that a new type of roof for an industrial b u i l d i n g m i g h t fall i n . Because i t is impractical to b u i l d a full-sized roof and subject i t to stress tests and because, for reasons that we shall discuss i n the next section, she may not be able to solve the p r o b l e m w i t h mathematics alone, the engineer w i l l often b u i l d a model b u i l d i n g w i t h a model roof, table-top size, and then study h o w m u c h punishment the roof can absorb before i t falls i n . T h e A r m y Corps of Engineers b u i l t a scale model to simulate the behavior of tides and river currents i n San Francisco B a y - a l l i n a warehouse. These physical experiments are often r u n w h e n the engineers k n o w h o w to figure out the answer mathematically b u t the calculation is too complicated to do easily, A second type of simulation study is the "game" study, i n w h i c h people compete i n laboratory games to simulate the way they compete i n various aspects of real-life competition. Such games have been developed to simu­ late political relations ( G u e t z k o w ) ; each player m i g h t be t o l d that he is a country and that his goal is to make deals w i t h other countries that w i l l get h i m into the best power position. The researcher then studies w h a t kinds of deals the various countries make under various circumstances. The business games played b y students at various business schools are another example. The students are given i n i t i a l data about their products and the demand for them, their financial resources, and so on. The students then t r y out various price and m a r k e t i n g strategies to increase their shares of the market and their profits, usually at the expense of their competitors. Such business games m i g h t be considered a very comphcated form of M o n o p o l y i n w h i c h 2. H. Guetzkow contains a variety of articles on aspects of simulation, a general bib­ liography (pp. 191-193), and bibliographies at the ends of individual articles. The article by R. Dawson offers a somewhat different scheme of classifying simulation studies than follows here.

216

Research Decisions and Procedures

the players have m u c h more control of w h a t happens and m u c h less is left to chance and a r o l l of the dice. I n the case of Monopoly, the purpose of the simulation m i g h t be to determine whether at game's end i t is usual for one player to have all the w e a l t h or whether the result is a standoff. The hallmark of this type of simulation experiment is that people interact in the experiment, usually i n competition w i t h one another. Recently economists have also begun to use s ò m e mock-experimental evidence. To gain insight into the way that competing businessmen behave, games have been set u p w i t h the players i n simulated business situations, and then their behavior has been observed ( V . Smith ) . Games are also used b y sociologists to find out h o w people i n small groups interact w i t h one another under various types of circumstances, under stress for example. A t h i r d type of simulation experiment is like the second, except that i t is not played w i t h real people b u t w i t h a computer. The computer is given data about h o w each u n i t i n the game is likely to act under various conditions, and then the whole t h i n g is set into motion to see where i t w i l l come out. The most ambitious simulation of this type is that of G. Orcutt, et al, i n w h i c h data are p u t i n about h o w the various units of the economy and society—for example, households and banks—are likely to buy, sell, and perform financial operations. Also p u t i n are the relationships of these groups w i t h one another. T h e n some starting "shock" is given to the system; for example, the banks are suddenly forced to raise interest rates. The computer then runs t h r o u g h the situations and interactions over and over again very rapidly, so that the researchers can see h o w things come out at the end of a long series of such situations. A n example of this t h i r d type of simulation w o u l d occur i f the computer were programed for the probabilities of h o w people w o u l d act i n various M o n o p o l y situations and then the game were played many times automatically b y the computer to determine the outcomes. This t h i r d type of simulation holds great promise for the future. B u t the value of the results of a simulation depends entirely u p o n the quality of the i n p u t data and relationships. A simulations m a i n advantage is the obvious one: the simulation can be done w h e n the real-life conditions cannot be experimented w i t h . A simulation's grave danger also is obvious : the simulation may not resemble the reallife situation i n one or more key elements. I n the example of the economist's market experiment, one must ask: does the game resemble real life? D o the players act the w a y people i n real-life situations act? Each person w h o examines the evidence must decide for himself h o w realistic that particular situation is. One difficulty i n m a k i n g the simulation realistic is g i v i n g the players motivations similar to those experienced i n the real-life situations. For example, can a player w h o is t o l d to play the role of the president of General Motors or the Chief of Staff of the A r m e d Forces, w i t h billions of dollars or millions of lives at stake, reproduce the same motivations and feelings w h e n

So7ne Other QuaUtative and Quantitative Techniques

217

p l a y i n g for pennies or t i n soldiers i n a lab? Can a person w h o is given a b l i n d f o l d and t o l d to play the role of a bHnd person simulate the feelings of a person w h o is really b h n d and w h o cannot take off a bHndfold at the end of the day? Another difficulty i n m a k i n g a simulation realistic arises from the complexity of certain social situations. For example, h o w can the experi­ ments simulate the conditions of bureaucratic complexity, or the complica­ tions of a real man-woman sexual relationship? One may t r y to validate the simulation b y comparing the results against other sorts of research. For example, i f animal experiments w i t h cigarette smoking jibe w i t h non-experimental survey data on humans' smoking, then both results are strengthened. I n the case of computer experiments w i t h households and banks, i t is difficult to find other data to compare w i t h . I n such a case, i t is helpful to check the accuracy of the i n p u t data about the behavior of the various units and about their relationships. " G I G O : Garbage I n , Garbage Out." The i n p u t data must first be developed b y the classical methods of empirical science—survey and experiment—before a simulation has a n y t h i n g to w o r k w i t h . I n some cases—such as a simulation of competition among t w o competing organizations (Simon, Puig, and AschofiF)—information neither on other studies nor the i n p u t variable may be available. I n that case, the only validation method may be to check that the results make sense internally, and that they jibe w i t h theory and general experience.

7. S u m m a r y This chapter describes some methods of getting knowledge other than b y traditional experiments and surveys. I f the premises are sound, deductive reasoning can be more accurate than empirical research. B u t often a scintillating chain of reasoning can be founded on incorrect premises. Expert opinion can be useful i f the experts really k n o w a lot about the topic. B u t scientific research tackles subjects about w h i c h expert knowledge is insufficient. The case study provides an indispensable overview of a subject w h e n little is k n o w n about i t . I t is generally not a substitute for more formal kinds of research, or vice versa. Participant-observation can provide rich data on social interrelations. B u t i t cannot be replicated w e l l , and its success depends upon the personal skill of the participant-observer. Content analysis is a systematic method for measuring attitudes and opinions from w r i t t e n and spoken language. The words and phrases are coded b y category of interest. A simulation can be a laboratory experiment, a gaming study, an experi­ ment w i t h a computer, or any other sort of m a n i p u l a t i o n of a model. The value of the simulation depends upon the realism of the inputs.

218

Research Decisions and

Procedures

EXERCISES 1. G i v e e x a m p l e s f r o m y o u r f i e l d in w h i c h t h e s e m e t h o d s m i g h t b e u s e f u l : a. d e d u c t i v e r e a s o n i n g b. e x p e r t o p i n i o n c. c a s e s t u d y d. content analysis e. s i m u l a t i o n 2. G i v e a n e x a m p l e in w h i c h e x p e r t o p i n i o n c a n b e g a t h e r e d in s u c h a w a y t h a t t h e s t u d y is reph'cable

a n d t h e r e f o r e p u b l i c a n d c h e c k a b l e by o t h e r s .

3. G i v e e x a m p l e s f r o m y o u r f i e l d in w h i c h m e t h o d s a t o e in E x e r c i s e 1 w e r e u s e d b u t w e r e bad

choices.

4. D e s i g n a s i m u l a t i o n w i t h i n w h i c h y o u c o u l d o b s e r v e t h e i n c r e a s i n g d i v i s i o n of l a b o r a n d t h e e m e r g e n c e of " s p e c i a l i s t s " as t h e g a m e p r o g r e s s e s . 5. D e s i g n a s i m u l a t i o n t h a t c o u l d t h r o w l i g h t o n h o w t h e s e v e r i t y a n d s u r e t y of p u n i s h m e n t i n f l u e n c e s " w h i t e - c o l l a r " b u s i n e s s c r i m e s .

ADDITIONAL

READING

FOR

CHAPTER

the

14

G e e ( C h a p t e r 7) c o v e r s t h e c a s e s t u d y t h o r o u g h l y a n d p r o v i d e s m a n y u s e f u l references. O n c o n t e n t a n a l y s i s , B e r e l s o n is a n e x c e l l e n t g e n e r a l s o u r c e a n d c o n t a i n s a c o m p r e h e n s i v e b i b l i o g r a p h y of e a r l i e r s t u d i e s . A l s o s e e I. D e S o l a P o o l . T h e v o l u m e of l i t e r a t u r e o n p a r t i c i p a n t o b s e r v a t i o n is e n o r m o u s . P a r t i c i p a n t o b s e r v a t i o n a n d r e l a t e d m e t h o d s a r e t r e a t e d v e r y w e l l in D e n z i n , e s p e c i a l l y in C h a p t e r s 9 a n d 10. Other useful w o r k s on participant observation are McCall and S i m m o n s , and B o g d a n a n d T a y l o r . A r e a d a b l e d e s c r i p t i o n of h i s o w n a d v e n t u r e s a n d t h o s e of o t h e r i n v e s t i g a t o r s , b o t h in t h e U.S. a n d in L a t i n A m e r i c a n c o n t e x t s , is t h a t of G l a z e r . Anthropologists give you the l o w d o w n on participant observation, together w i t h p l e n t y of e x a m p l e s , in F r e l i c h ' s b o o k . N o t e e s p e c i a l l y his i n t r o d u c t i o n . A n o n o n s e n s e d e s c r i p t i o n of Field Methods in the Study of Culture is b y Williams.

15 c l a s s i f y i n g , measuring, and scaiing 1. 2. 3. 4.

Classifying Measuring and Scaling Strengths of Scales Summary

Classification and measurement as types of research problems were dis­ cussed i n Chapter 4; that is, classification and measurement were discussed as goals in themselves i n some research situations. I n research that seeks causal or noncausal relationships and i n purely descriptive research, how­ ever, classification and measurement are the means b y w h i c h the research proceeds. Once the variables are chosen for any piece of research, the researcher must decide h o w to separate the variables into different values; that is, she must "scale" the variables. I f she scales a variable qualitatively, that is, w i t h o u t numerical relationships, then she is classifying the categories of the variable. I f she scales a value quantitatively, that is, i f there are numerical relationships among the categories, then the variable w i l l be a measurement scale. Examples to illustrate this distinction w i l l be found later i n this chapter. Three i m p o r t a n t issues are common to b o t h classifying and measuring: first, choosing the dimensions to scale or measure (practically synonymous w i t h choosing variables); second, deciding which categories to use i n the scales or w h i c h type of scale; and t h i r d , defining the categories and d r a w i n g their boundary lines. Each of these issues is discussed i n the chapter. The topic of scaling h u m a n responses is discussed i n the next chapter.

220

Research Decisions and Procedures

1. C l a s s i f y i n g This chapter is about how to classify and measure and not about the nature of classification. Nevertheless, i t is w o r t h devoting a very few words to the nature of the process. W h e n y o u classify a group of people or a set of objects, y o u are p u t t i n g people into a set of pigeon-hole categories that have different names.^ The names of the categories have no special scientific significance, t h o u g h the names may be chosen for convenience; for example, C. Linneaus' two-name system for plants, replacing long and u n w i e l d y names, was convenient and perhaps aided biologists' imaginations i n understanding new connections among categories, t h o u g h the names have no intrinsic meaning. W h e n y o u place a group of people i n a single category, y o u are asserting that everyone i n that category w i l l be treated as similar and that people i n different categories w i l l be treated differently i n the course of your study. W h e n y o u classify people b y nationality, y o u w i l l , i n your subsequent use of the data, treat a l l Americans as the same b u t as different from Chileans, Taiwanese, and other groups. These statements of similarity and difference are only a device to help y o u to handle the data scientifically, of course, and they say n o t h i n g about whether people are "really" equal or unequal, morally or otherwise. C o d i n g different things or people into the same category inevitably loses some of the information y o u have about t h e m : Categorization is like con­ verting a color photograph into black and w h i t e . For example, you may have collected some open-ended (unstructured) interviews about people's feeHngs t o w a r d racial integration. I n order to handle the data more easily, y o u m i g h t classify ( c o d e ) each interviewee as either "for" or "against" integration. You thereby lose all the shadings of opinion voiced by the subjects and all the richness and variety of their comments. But, unless y o u classify the subjects i n some manner, y o u w i l l not be able to summarize the data or manipulate t h e m i n other objective ways. ( O f course, i t is possible to classify on several dimensions at the same time, so that some more information can be saved and used simulta­ neously. ) Loss of i n d i v i d u a h t y i n observations grouped together i n categories is the basis for a persistent attack on science. People say " H o w can y o u treat a moderate and a radical as i f they were exactly the same i n your survey?" or " W h y do y o u l u m p together wholesalers i n V e r m o n t and wholesalers i n Louisiana w h e n they serve very different markets?" The real question, how­ ever, is not whether the items that y o u l u m p together are different i n some ways, b u t rather whether they are similar for your purposes. I f they are, your classification is perfectly satisfactory. As noted earlier, the same 1. The pigeonholes can also be given identification numbers like the numbers on football players or railroad cars. But those numbers usually have Httle real significance, a point I shall touch on again in the section on measuring.

Classifying, Measuring, and Scaling

221

antidote is appropriate for a c h i l d w h o swallows cleaning fluid i n V e r m o n t or i n Louisiana. Sound classification ( t y p o l o g y ) is a difficult art, as A . Kroeber tells us about the couvade phenomenon: The couvade is a custom formerly attributed to the Basques of the Pyrenees, un­ der which on the birth of a child the mother got up and resumed her household duties, whfle the father went to bed in state and lay in. This piquant habit at­ tracted even more interest when it was found that a good many primitive and backward peoples did more or less the same thing in South America, in Africa, in India and China. These all believed the chfld would suffer if they did not observe the custom. However, most of them did not go quite as far as the Basques are said to have done: the father lay in, but so did the mother. Cessation from labor by both parents was demanded for the chfld's health. In still other tribes, both parents refrained from work and certain foods, but the mother refrained more strictly and longer than the father. Among others, the mother alone was under restrictions. And finally, among the southern Ute, where the mother lies stfll on a bed of hot ashes for thirty days, the father lies with her only for four, and then, after a good meal, must run and hunt as actively as possible for one or several days. W i t h all this gradation, what constitutes the couvade typologically? The most that i t would be possible to give as a definition is: the participation of the father in the period of rest and recuperation that is physiologically natural for the mother after childbirth. I n other words, the idea is expressed that it is his child too. Superimposed on this is an endless variety of things prohibited and things required, for the good of the child or for the good of the parents, for a few days or for a full month. And above all, there is every intergradation from the father's sole role, through a joint one, to the mother's alone. No wonder ethnographers have come to talk about "classical couvade," "semicouvade," "pseudo-couvade," and the like, without being able to define the couvade forms so that all specific tribal customs fall unqualifiedly into one or the other class. In short, we have no satisfactory typology for the couvade. Hence in a comparative study we would sometimes be comparing part-comparables, perhaps even noncomparables. The common factor is the name, plus a vague and extremely plastic concept. An ex­ haustive monograph on the couvade would be almost as near to a train of related but free associations as to a scientific treatise. The question of whether the couvade has been diffused from a single origin or has had several independent origins can therefore not be answered at present. I t is not yet a scientific problem, because the couvade is not a definable recurrent phenomenon. (Kroeber, pp. 542-543)

a. CHOICE OF DIMENSIONS FOR A CLASSIFICATION SCHEME

The dimensions for the classification scheme should be chosen to fit ones purpose. You should choose a dimension ( v a r i a b l e ) for your study that y o u beheve is either of interest i n itself or is likely to be helpful i n understanding

222

Research Decisions and Procedures

another variable. For example, the U.S. Bureau of L a b o r Statistics wants to k n o w the situation of labor i n the economy. I t therefore designates employ­ ment-unemployment as a dimension on w h i c h i t is useful to classify people. W h e n B.L.S. finds out h o w many people are i n each of the categories "employed" and "unemployed," i t has developed valuable i n f o r m a t i o n for its purposes. A social psychologist m i g h t choose the dimension of employment for en­ tirely different reasons—to help explain crimes of violence, perhaps. Count­ ing h o w many people convicted of crimes of violence were i n the categories "employed" and "unemployed" m i g h t help to explain w h y crimes of violence occur or help to predict w h e n and where crimes of violence w i l l take place. I f y o u are studying only an A m i s h community i n Pennsylvania, the reli­ gion dimension w i l l not be useful as a classification. ( B u t Amishness w i l l be a parameter of your study.) I n most sociological studies i n the U n i t e d States, y o u woidd use a Protestant-Catholic-Jew dimension. O n the other hand, i n most countries of the w o r l d , you w o u l d need additional categories for a religion dimension. A n d , of course, i n many types of studies—of n u t r i ­ tion, say—religion may not be a useful dimension at all. ( I t m i g h t be, though, that an investigator w o u l d notice that religion does, indeed, make a difference i n the food intake of people. H e m i g h t then decide to include i t i n his classification. ) There is no metaphysical all-purpose classification for everything i n the w o r l d , though some philosophers have spent their lives t r y i n g to create one. A n illustration of the variety of classification schemes for different purposes is the fact that different types of libraries find different classification schemes best for their various purposes. F. M a c h l u p provides an example and a discussion of the rationale of economic classification w h e n discussing oligopoly ( c o m p e t i t i o n among a small number of firms ) : Hardly any generalizations could be made about the "economic consequences of oligopoly" or about the "causes of oligopoly" unless we first separate different types or kinds of oligopoly. I n other words, a classification of oligopoly is needed. Of course, an indefinite number of features or conditions could be named as conceivable bases for classification. The problem arises what to select in order to obtain a classification useful for purposes of economic analysis. Distinctions should be made wherever one finds differences suspected of "making a differ­ ence" sufficiently important within the chosen frame of reference. I t is probably generally agreed that selling prices, output volumes, selling efforts, product qualities are among the major variables considered relevant for our purposes. Hence, the distinctions made in a classification of oligopoly may relate to differ­ ences likely to affect these major variables. I t will also be agreed that the ques­ tion of the sources of monopoly power and of the conditions responsible for oligopolistic situations is significant. Hence, a classification may be based on distinctions of the causes of oligopoly. . . .

Classifying, Measuring, and Scaling

223

[N]o single classification [of oligopoly] could possibly serve all purposes. Crossclassification may sometimes be helpful, although the number of possible com­ binations might become overwhelming. On the other hand, a problem under investigation may be aided by one classification while others are irrelevant. For example, those interested in public policy formation may find it important to know whether an existing big-firm oligopoly is based on definite real-cost advantages or rather on the exploitation of institutional privileges or on the use of coercive or oppressive practices. But they may not in the least be interested in the symmetry or asymmetry of the leadership aspirations of the firms in ques­ tion. A cross-classification according to leadership and collusion, however, may be significant. For there may be important differences in the effects of collusion without leadership, leadership without (a high degree of) collusion, and collu­ sion enforced by leadership. (Machlup, 1951-1952, p. 160) M o r e than one dimension is necessary i f more than one dimension helps to distinguish various groups or to p r e d i c t their behavior. B u t there is no use in h a v i n g t w o dimensions i f b o t h classify a l l occurrences i n the same way. For example, i f all candidates for the baseball team w h o t h r o w h a r d also t h r o w far and vice versa, either t h r o w i n g hard or t h r o w i n g far w i l l be sufficient. Constructing an appropriate classification scheme is not as simple as i t seems, and i t requires imagination. A textbook, for example, is largely a classification of knowledge i n a particular field. For example, a feature of this book is the classification of obstacles to research i n Chapters 18-24. Dimensions (variables) may be previously chosen b y theory or b y the direction of your interest, as we have discussed i n previous sections on the choices of variables and proxies. B u t dimensions may also be selected b y t r i a l and error, especially i n classification research i n w h i c h the classification scheme is itself the goal of the research. One usually constructs an empirical classification w i t h several se­ quences of t r i a l and error, followed b y examination of h o w w e l l the classi­ fication works. For example, t h i n k h o w you have arranged the books on your bookshelf. O n an ad hoc basis, y o u have placed those books together that seem to belong together. You may have begrm w i t h some a priori elements i n the scheme like separation of natural sciences, social sciences, and humanities. B u t beyond that y o u probably h a d no special scheme i n m i n d , and y o u shuffled and rearranged u n t i l the arrangement seemed to make sense. L a t e r on, as y o u l i v e d w i t h the classification, y o u may have changed the arrangement as y o u noticed that some books were used at the same t i m e or that y o u could find some books more easily i f they were nearer psychology than sociology, for example.^ 2. It is an interesting reflection of U.S. national character that, unlike most other library classification schemes, the Library of Congress system is entirely without theory and is based solely on the pragmatic judgment of which books will be likely to be used together. This totally empirical approacli may well be the most successful way to deal with the in­ soluble problem in library classification: that books can be arranged physically on only one dimension, even though a logical classification would require many dimensions.

224

Research Decisions and Procedures

M u c h the same process seems to w o r k for the organization of a college course, book, or paper. M y o w n practice is to w r i t e d o w n the various ideas and facts on index cards, along w i t h titles for major sections, and then to spread them on a b i g table. T h e n I pick them up and place t h e m i n that order that seems to make most sense. The classification seems to evolve satisfactorily i n this manner, even t h o u g h I cannot formulate satisfactory criteria at first. Some researchers t r y to play i t safe b y collecting information on every conceivable dimension. By being super cautious, however, one may spend far more time and money than the information that he really wants should cost. The wisest researcher w i l l collect data on those dimensions that she thinks are quite likely to be useful, plus all those dimensions that m i g h t conceivably be of use and that she can collect w i t h very little extra cost. This selection requires delicate judgment.

b.

CHOICE OF CATEGORIES FOR CLASSIFICATION SCALES

Once y o u decide that y o u w a n t to employ people's use of umbrellas as your proxy for whether i t is raining, your decision about which categories to use for this dimension w o u l d seem to be an open-and-shut case. B u t w h a t about "no umbrella"? The answer must depend u p o n your purpose. Some of the choosing of categories can be done after the data are col­ lected; y o u can still decide to group several categories together i n fewer categories. B u t the crucial classification w o r k takes place at the design stage. For example, y o u must decide whether to include the category of "partially employed," i n a d d i t i o n to "employed" and "unemployed," and whether to code all subjects into one or the other major category. You must decide whether to classify countries as "highly developed" and "underdeveloped" on the basis of per capita income, to add intermediate categories, or simply to use the continuous variable of per capita income. H o w sensibly y o u choose the categories and the d i v i d i n g Hues w i l l determine h o w useful the variable w i l l be to you. ( T h e next section discusses how to draw the d i v i d i n g lines and to define the categories.) W h e n constructing a classifica­ t i o n scheme, y o u must be very flexible i n accepting new categories ( a n d dimensions) and t h r o w i n g away existing categories ( a n d dimensions). The color categories used i n various cultures illustrate the effect that one's purposes have u p o n one's choices of categories. Manus children . . . say "yellow, olive-green, blue-green, gray, and lavender as variants of one color"; and while the Ashanti have distinct names for black, red and white, black is applied to any dark color—brown, blue, purple, etc.—whfle red includes what we differentiate as pink, orange and yellow. (Haflowell, p. 349) A n d the Eskimos have many different categories w i t h i n w h i c h to classify ice, whereas to us i n the U n i t e d States a l l ice is just ice.

Classifying, Measuring, and Scaling

225

I f the categories are expressed i n words, the scaled variable is often called "qualitative" or "discontinuous." B u t i t is possible to go from words to numbers or vice versa, t h o u g h the process is not symmetrical. For example, i f y o u have the heights of the people i n your sample i n feet and inches, y o u can arbitrarily choose to call everyone over 5 feet 10 inches " t a l l " and every­ one under that height "short." I n this w a y y o u have converted "quantitative" data into "qualitative" verbal categories. The numerical cut-off points for the categories should be those that are most useful for y o u . T h e N e w York pohce force establishes a cut-off height of 5 feet 8 inches for policemen because i t believes that 5 feet 8 inches is a better cut-off point for its purposes than 6 feet or 5 feet 3 inches, taking into account its need for t a l l policemen as w e l l as the potential supply of candidates for the job. Sometimes i t is helpful to t h i n k of each possible numerical measurement as a category i n itself. A t first, i t seems that there is an infinite number of numerical categories, because, for example, the heights of any t w o people w i l l differ by at least some infinitesimal amount and no t w o are exactly the same. But, w h e n y o u record the heights, y o u must record i n feet and inches, w h i c h means that anyone between 5 feet and 6 feet tall can fall into only one of twelve categories at most. O f course, i f y o u used a finer scale—cali­ brated i n sixteenths of an inch, perhaps—there w o u l d then be more and smaller categories. B u t there w i l l always be some l i m i t on h o w small the categories can get, for your measuring instrument w i l l cease to distinguish at some degree of fineness; there must always be some point at w h i c h the categories are discrete and not continuous.^ ( N o t i c e that i t is the categories that are discrete, not the people's heights. I t does not make m u c h sense to discuss whether people's heights are continuous or discontinuous, for the argument could be decided only b y measurement, w h i c h must be discon­ tinuous.) Conversely, i t is possible to go from verbal categories to numerical cate­ gories. I f y o u have data on people's reactions to a political event, y o u can designate "like" as "3," "neutral" as "2," and "dislike" as " 1 . " This technique is handy for many purposes, and i t must be used whenever y o u transfer your data to cards or magnetic tape for automatic data processing. B u t y o u must remember that, just because y o u have designated "like" as "3" and "dislike" as " 1 , " i t does not mean that "like" is three times as great as "dislike," that three "dishkes" equal one "like," or that "like" is as far above "neutral" as "neutral" is above "dislike." You could have p i c k e d any n u m 3. As O. Morgenstern forcefully points out, measurement that is too fine runs the great danger of deluding people into thinking that the data are much more accurate than they really are. Morgenstern quotes N. Wiener that "economics is a one or two digit science," which implies that any numbers that do not round off to one or two digits are misleading (Morgenstern, p. 116). An example is the history of the 1929 potato crop statistic: "First it was raised 2 million bushels, then lowered 30 million, then lowered another miUion, then boosted 5 million, and recently raised another million. It now stands at 333,392,000. Disregarding the O's, the only digit remaining of the six given in the original estimate is the first, the figure 3!"

226

Research Decisions and Procedures

bers—"01," "07," and "44," for example, instead of " 1 , " "2," and "3." Unless you have decided beforehand, on the basis of some other evidence, that the categories bear some special numerical relationship to one another, you cannot assume any numerical relationship among them. But, i f y o u do de­ cide that t w o "likes" shall equal one "neutral," then you can add them and subtract them at w i l l . Remember, though, that the final answer is dependent upon the numbers that you assigned. (Inexperienced or cynical researchers sometimes fool themselves or others b y r i g g i n g the numbers that they assign so that the outcome is w h a t they w a n t i t to be. The O l y m p i c Games provide a lovely example. The U n i t e d States [unoflficially] wants to count 5-3-1 for first, second, and t h i r d places; b y that scorekeeping i t beats the Soviet U n i o n . The Soviet U n i o n counts points 3-2-1 instead, and w i t h that system it wins. W h o is right? N o b o d y can be called "right" u n t i l we have other rea­ sons to help us judge w h i c h system of counting makes more sense.)

C. D E F I N I N G T H E CATEGORIES I N A CLASSIFICATION SCALE

After y o u choose dimensions and categories, y o u must operationally define the categories. For example, once y o u have decided that the p r o p o r t i o n of umbrellas i n use w i l l be your measure of the extent to w h i c h i t is raining, you must decide whether parasols count as umbrellas. A n d w i l l a newspaper over the head be counted as an umbrella? Is an open umbrella carried over the shoulder to be counted as an open umbrella? You must define your categories so that most of your observations w i l l fit into one or another category w i t h o u t too m u c h doubt or arbitrariness i n the process. The classification of humans b y sex has the useful property that i t is easy to designate most human beings as man or woman, though there w i l l be some exceptions w h o are not easily classifiable. DiflFerent definitions of "employed" and "unemployed" cause argument and confusion. For example, the N a t i o n a l I n d u s t r i a l Conference Board esti­ mated unemployment for November 1935 at 9,177,000, whereas the L a b o r Research Association estimated i t at 17,029,000. This enormous difference came from the definition of "unemployed"; the L a b o r Research Association i n c l u d e d farm unemployment and unemployment among professionals, whereas the N . I . C . B . d i d not. Also for 1935, the U.S. Chamber of Commerce estimated unemployment at a snappy 4 m i l l i o n , b u t sampling techniques may help to explain this l o w estimate (J. Cohen, p. 664). I n 1949, Soviet oflBcial Georgi Malenkov could still estimate U n i t e d States unemployment at 14 million—using good American data—whereas the U.S. Bureau of the Census was estimating only 4 m i l l i o n unemployed. M a l e n k o v simply classi­ fied as "unemployed" anyone w h o w o r k e d "less than f u l l time" ( W a l l i s & Roberts, 1962, pp. 9 0 - 9 1 ) . C r i m e statistics sometimes take frightening leaps because the classifica­ tion scheme has been changed ( o r because the recording of crimes has been i m p r o v e d ) . This has also been true of medical statistics; for example, better

Classifying, Measuring, and Scaling

227

medical knowledge and practice l e d to more accurate diagnosis of l u n g can­ cer as a cause of death and decreased the number of mistaken diagnoses of tuberculosis and other diseases (U.S. Public H e a l t h Service, pp. 127-141). Here is the sort of definitional decision that must be made i n categorizing employment status: I f a w o m a n w o r k e d t w e n t y - t w o hours each week d u r i n g the last m o n t h , should y o u mark her d o w n as "employed" or "unemployed"? Your first reaction m i g h t be that the categories were not w e l l selected and that there should be a category of "partly employed," perhaps. Or y o u m i g h t suggest that people should be categorized separately for 0-10 hours, 11-20, 21-30, and so forth. B u t for some purposes, at least, the simple dichotomous classification "employed-unemployed" is useful, especially as a simple index for l a y m e n to w a t c h i n the newspapers. So y o u must classify the w o m a n who w o r k e d t w e n t y - t w o hours one w a y or the other. Definitional decisions take up m u c h of an empirical researcher's attention. O n l y the theorist can bHthely ignore their existence. Another example: Should the w o r k that people do at home be counted as employment? O u r usual practice is not to consider housewives as part of the labor force. This decision causes certain problems, as is illustrated b y the story of the t w o EngHshwomen w h o h i r e d each other as house workers, then fired each other at the end of six months, i n order to collect unemployment benefits. A n example of h o w a given type of obesity is defined for an obesity classification is given on page 47. Each of the characteristics contained i n the definition is itself somewhat ambiguous, of course. B u t the definition is a success i f a consensus of doctors w o u l d agree u p o n a decision to classify a given person i n or out of that category. Defining categories is a progressive process. As tough instances come along, y o u sit d o w n and relate them to your final purpose and then amend your definitions to cover them. I n establishing these criteria the i m p o r t a n t things to remember are as follows: First, have a reason for your decision; second, relate the decision to your ultimate purpose; and, t h i r d , refer back to your theory for guidance, i f y o u are w o r k i n g w i t h the aid of a strong theory. You must sometimes also t h r o w i n a touch of arbitrary reasoning. There w i l l be some observations, for w h i c h y o u are at a total loss h o w to decide, and y o u feel as i f y o u could do as w e l l b y flipping a coin. Don't. Make yourself select one category or the other; that selection w i l l probably be better than a random decision. Hopefully, however, such near toss-ups form only a small part of your sample.

2. M e a s u r i n g a n d S c a l i n g Most of w h a t is true of classifying is true of measuring, for measurement is a type of classification. ( O r , i f y o u prefer, classification may be considered a type of measurement, as we shall see.) The m a i n difference between classification and measurement is the use of

228

Research Decisions and Procedures

numbers for computational purposes. (Numbers can also be used simply as labels for categories, as we shall see i n nominal scales. B u t this is still classification b y number; labeling does not take advantage of the i m p o r t a n t properties of the number system.)

a.

T H E N A T U R E OF M E A S U R E M E N T

S. Stevens ( p . 25) gives a succinct definition of measurement: "Measure­ ment is the assignment of numerals to events or objects according to rule." This is an i m p o r t a n t definition and w o r t h pondering. I t emphasizes the centrality of numbers i n measurement. A n d it^ directs our attention t o w a r d the various rules that we may use i n measuring. A scale is the operational rule that one uses i n a measurement. I t is no coincidence that i n Enghsh the measurement instrument for length is called a rule-r. Our discussion of measurement is largely a discussion of w h i c h sets of rules—scales—are best for w h i c h kinds of situations. There are many diflficult philosophical arguments about measuring and scaling. The philosophical position that I shall take is the pragmatic one that whatever helps a researcher get on w i t h the business of p r o d u c i n g useful knowledge is sound measurement, and whatever i n measurement hinders a researcher or causes a researcher to produce t r i v i a l t h o u g h impressive-look­ i n g w o r k is unsound. I n the physical and biological sciences, measurement is largely a matter of finding satisfactory physical instruments to measure the phenomenon one is interested in—the amount of oxygen on Mars, the speed of transmission of nerve impulses, the number of defects i n an airplane w i n g . The electron microscope and the electron telescope are polar examples of such technological advances. B u t instruments also can help i n the social sciences. For example, because psychologists w a n t e d to measure h o w active animals are under various conditions, they developed r u n n i n g wheels and treadmills that w o u l d automatically measure activity. Similarly, t i m e d quickexposure projectors make possible well-controlled experiments on learning speeds. A n d anthropologists measure the age of artifacts w i t h the Carbon-14 technique. I n problems that involve counting, simple arithmetic provides a l l the necessary scales and categories. B u t i t is sobering to remember that this apparently simple mental machinery was not always w i t h us. The number system p r o b a b l y originated w i t h the desire of cattle owners to k n o w h o w many cattle they o w n e d and to keep track of their herds or flocks ( D a n t z i g , pp. 2 0 - 2 1 ) . C o u n t i n g the soldiers i n an army was another early p r o b l e m i n cardinal measurement. T h e early rulers of Madagascar counted their army b y m a r c h i n g one man at a time t h r o u g h a narrow m o u n t a i n pass and drop­ p i n g one pebble i n a pile for each man w h o passed ( D a n t z i g , p. 2 8 ) . The inventions of numerals and early mathematics were extraordinary advances.

Classifying, Measuring, and Scaling

229

B u t even after they existed, the crudeness of scales kept measurement inac­ curate, as R. Heilbroner points out: . . . [T]he early merchant had to settle his accounts by weight of metal, and when a shekel was equal to so many grains from the middle of an ear of wheat and the carat of gold equal to the weight of a carob bean, there were problems enough to tax the skill of a modern arbitrager. (Heilbroner, p. 42) Social scientists, also, seek better devices to learn about actual events and behavior. For example, estimating the number of houses i n a country can be done b y census enumerators, airborne photography, infra-red photography, and so on. Or the student of communications w h o wants to k n o w w h i c h pages of a magazine are read can use fingerprint powder. These "nonreactive" measures are appropriate w h e n we are interested i n w h a t is outside of people's minds. B u t w h e n we are interested i n the contents or processes of people's minds, w e must ask people to respond ( r e a c t ) to stimuli presented to them. I t is the peculiar problems i n v o l v e d i n constructing scales to mea­ sure the contents of people's minds that make scaling an i m p o r t a n t special topic i n social science.

FIGURE

15.1

Source: © 1976/1969 by the New York Times Company. Reprinted by permission.

230

Research Decisions and Procedures

Please notice that the mere presence of a h u m a n being i n a scientific measurement situation is not the source of our interest i n scaHng. I f we w a n t to k n o w h o w hot a room is, and i f a person reads a thermometer and records the result, h u m a n response is not central; we could replace the person w i t h an automatically read thermometer. B u t i f we ask the person, " H o w hot do you feel?" under a variety of conditions of temperature and stress, the person s responses are central i n our interest. I n the latter case w e are interested i n getting variations i n reactions w i t h i n or among persons as w e change conditions, whereas i n the former case of thermometer reading we are interested i n avoiding variations i n h u m a n responses. 3. Strengths of Scales This section takes up the various scales according to their "strength" or " p o w e r " - b o t h vague words referring to the amount of formal information contained i n the scale. F o l l o w i n g Stevens, we distinguish four types of scale—nominal, ordinal, interval, and ratio.

a. N O M I N A L SCALES

Stevens considers classification to be a type of measurement—a n o m i n a l scale—because a number can be assigned to any given classification just as numbers are given to football players or chicken houses. B u t the numbers then mean no more than do any other names, for instance, "schizophrenia" or "pros]ierity" or "traditionahsm." The number may be useful, b u t i t is unnecessary, and I therefore do not consider classification a t y p e of mea­ surement.

b.

ORDINAL SCALES

A n ordinal scale is exemphfied b y the street numbers on houses. I f I five at 1105 South Busey, you hve at 1111, and B i l l lives at 1115, we k n o w that y o u live between B i l l and me. Notice that this ordinal scale contains more iriformation than does classification. I f B i l l wears 11 on his football jersey, Jim wears 13, and Jack wears 15, y o u k n o w n o t h i n g more about B i l l or Jim or Jack than i f they wore any other numbers ( a l t h o u g h their numbers may t e l l you that a l l are quarterbacks). Another ordinal example is the Mobs scale of hardness of minerals. F . Mobs w o r k e d out a system b y w h i c h a given mineral w o u l d be rated for hardness by comparison w i t h other minerals. Harder stones scratch softer stones, and any given stone is given a r a t i n g between a stone i t can scratch and one that can scratch i t . Note that one stone cannot be said to be t w i c e as h a r d as some other stone. A stone can be said only to be harder than, softer than, or equal i n hardness to another stone. A t best, then, stones can

Classifying, Measuring, and Scaling

231

only be p u t i n an order of hardness, and this sort of scale is therefore called an ordinal scale. A t t i t u d e and opinion scales are classic examples of ordinal scales. Here, for example, is an illustration of a brand-attitude scale: Listed below are several brands of each of two household products. For EACH brand place an " X " in the one box which best indicates how much you dislike Or like that brand. The more you dislike it, the smaller the number you should give it. The more you like it, the bigger the number you should give it. There are no right or wrong answers. Only your opinion counts.

TABLE 15.1

Products

Dislike Completely 1

Dislike Somewhat 2

Dislike a Little 3

Neither Like Like Nor a Dislike Little 4 5

Like Somewhat 6

Like Com­ pletely 7

Toothpaste Brand A Brand B Brand 0 Scouring cleanser Brand D Brand E Brand F (Abrams, p. 193)

A virtue of ordinal scales is that people can often make an accurate judgment about one t h i n g compared to another, even w h e n they cannot make an accurate absolute judgment. One can often say whether he likes this p a i n t i n g better than that one, even though he cannot say h o w m u c h he likes either one. T o illustrate the accuracy of comparative judgments, y o u can often tell whether a c h i l d has a fever—that is, w h e n the child's temperature is t w o or three degrees above n o r m a l - b y touching the child's face to yours and comparing whether her skin is warmer than yours. B u t y o u w o u l d be hard-put to say whether the temperature outside is 4 0 ° or 5 5 ° F, or whether a piece of metal is 130° or 150° F .

C. I N T E R V A L SCALES

A n interval scale contains even more information than does an ordinal scale. Unless all the lots on the block are the same size, 1 cannot tell how far apart

232

Research Decisions and Procedures

our three houses are b y just k n o w i n g the house numbers. B u t i f the number on each house represented its distance i n yards or rods from the b o t t o m of the street, the house numbers w o u l d indeed tell us exactly h o w far apart any t w o houses are. This type of system is called an interval scale because the interval between 1 and 2 is equal to the interval between 2 and 3, 3 and 4, or 1105 and 1106. The intervals between numbers on an ordinal scale do not have this meaning at a l l .

d . RATIO SCALES

T h e example of the house numbers contains an additional k i n d of informa­ tion that makes i t a ratio scale. T e n yards contain twice as many one-yard segments as do five yards, and we can say that the ratio of ten yards to five yards is t w o to one. W e cannot say, however, that 100° F . is twice as hot as 5 0 ° F., and therefore the Fahrenheit scale is only an interval scale and not a ratio scale. I f this statement is not clear to you, reflect on the fact that 50° F . = 10° C. and that 100° F. = 38° C. Very clearly 38° C. is not twice as hot as 10° C. I n t e r v a l and ratio scales are often k n o w n together as cardinal scales. The most common uses of cardinal scales i n the social sciences are i n counting—counting people of various kinds, counting sums of money, counting the number of units of behavior, and so on. For example, H . Ebbinghaus grew dissatisfied w i t h simply classifying people according to whether they could remember a given series of nonsense syllables, for he could not then distinguish among degrees of memory. Therefore, he tested each subject several times and counted the p r o p o r t i o n of times each person correctly remembered the series. Such an approach gave h i m a cardinal scale to w o r k w i t h (Ebbinghaus, p . 9 ) . W e can illustrate the relationships among these sorts of scales w i t h a single example: You are shown three sculptures. You label them " 1 , " "2," and "3" respectively, thereby creating a n o m i n a l scale. You decide that y o u like # 2 better than #3 and # 3 better than # 1 , a nice consistency; your prefer­ ences n o w form an ordinal scale. You decide that y o u w o u l d pay $200 more for #2 than for # 3 , $100 more for # 3 than for # 1 , and $300 more for # 2 than for # 1 ; those willing-to-pay numbers form an interval scale. I f y o u n o w go one step further and say you're w i l l i n g to pay $1300 for # 2 , $1100 for # 3 , and $1000 for # 1 , those numbers form a ratio scale. O f course once y o u have the ratio-scale information i n hand y o u could also go the other way; y o u could then form an ordinal scale b y d r o p p i n g the dollar figures. There has been m u c h talk about h o w one or another sort of scale "re­ quires" or "permits" one or another k i n d of statistical operation. Such rules do not seem helpful ( G u t t m a n , 1971); rather, common sense and basic understanding of w h a t y o u are t r y i n g to do w i l l serve you better than any such rules.

Classifying,

Measuring, and Scaling

233

This m i g h t be a good place to mention again that measurement is opera­ tional definition. T o say "stick is long" does not suggest any operation, explicitly or i m p l i c i t l y , and therefore the length of the stick is not w e l l defined and has not been measured. B u t to say "stick same length m y arm" or "stick longer m y arm" or "stick t w o times long m y arm" is to state i m p l i c i t l y that the relevant operation is a comparison of the stick w i t h m y arm; and, w h e n someone has performed the operation of that comparison, he can k n o w exactly w h a t length of stick y o u are talking about. " A m a n s arm" may be a somewhat ambiguous measure, and that is w h y w e have the p l a t i n u m - i r i d i u m meter stick i n the U.S. Bureau of Standards, where i t is kept at a carefully regulated temperature, so that we have an almost-perfect ultimate standard against w h i c h we can measure things.

4.

Summary

Classification and measurement are crucial technical operations i n research procedure. The theorist can and does ignore these operations, b u t the excel­ lence of empirical work depends upon one's skill i n carrying t h e m out. W i t h classifications and measurements we make distinctions and we order our knowledge systematically. These operations are fundamental i n science. First one must decide w h i c h dimension of the phenomena one wishes to classify or measure. T h e n one must decide w h i c h categories or measurement system to use. F i n a l l y one must decide h o w to classify the phenomena into categories or h o w to make the measurement. The basic rule for all these research decisions is that they should be made w i t h regard to one's under­ lying purpose i n conducting the research.

EXERCISES 1. S h o w h o w i n f o r m a t i o n is lost w h e n c l a s s i f y i n g o b s e r v a t i o n s in s o m e r e ­ search study. 2. G i v e t w o r e s e a r c h e x a m p l e s in w h i c h t h e s u b j e c t ' s s e x is a n i m p o r t a n t d i m e n s i o n f o r c l a s s i f i c a t i o n a n d t w o e x a m p l e s in w h i c h it is n o t . 3. F o r w h a t p u r p o s e m i g h t it b e e n o u g h t o h a v e t w o c a t e g o r i e s of p o l i t i c a l p a r t y in t h e U n i t e d S t a t e s , a n d f o r w h a t p u r p o s e s m i g h t m o r e c a t e g o r i e s b e necessary? 4. G i v e a n e x a m p l e f r o m r e s e a r c h in y o u r f i e l d in w h i c h d e f i n i t i o n of c a t e ­ gories presents a major difficulty. 5. S h o w a n e x a m p l e of u s e in y o u r f i e l d of e a c h of t h e f o l l o w i n g : a. a n o r d i n a l s c a l e b. a n i n t e r v a l s c a l e c. a r a t i o s c a l e

234

Research Decisions and

ADDITIONAL

Procedures

READING

FOR

CHAPTER

15

F o r e x c e l l e n t d i s c u s s i o n s of t h e p h i l o s o p h y o f m e a s u r e m e n t s e e S t e v e n s , from whom I draw some examples; Kaplan, pp. 171-198; and Coombs. O n t h e p r o b l e m s a n d m e t h o d s of m e a s u r e m e n t , s e e S e l l t i z et ah ( C h a p t e r 6 ) . Kerlinger (Chapter 25), 2d ed., treats measurement from the psychologist's point of view.

15 s c a l i n g h u m a n 1. 2. 3. 4. 5.

Types of Mental Activity to Be Measured Stimulus and Response Scales Simple Composite Scales Choice of Scales Summary

Tfie term scale is one of the most confusing i n social science, because of the many ways i t is used. Some discussion of "scale" and related terms and concepts is therefore i n order. First let us distinguish dimension and scale. The concept of dimension belongs to the realm of theory. I t refers to an aspect or characteristic of the phenomenon that y o u are interested i n ; for example, happiness or intelHgence or p r o d u c t i v i t y . The terms dimension and theoretical component are often used synonymously. A n indicator, or empirical variable, is an empirical tool—a proxy for the theoretical dimension—used to capture and represent a theoretical dimen­ sion. For example, one question, i n a series of questions, about a person's self-judgment of the person's happiness m i g h t constitute an operational proxy for happiness. Or one m i g h t use the suicide rate as a proxy for the relative happiness of given groups of people. The term "scale" is frequently used as a synonym for indicator or proxy. The t e r m "scale" is most frequently used i n social science to refer to measurements that involve judgntent, or subjective ratings. T h e judgment may enter i n the process of scaling itself, as w h e n an expert is asked to rate (scale) a group of teachers. Or the judgment may arise w h e n the researcher identifies the scale w i t h the dimension of interest; for example, w h e n the researcher employs a questionnaire to measure happiness. The same sort of judgment is used w h e n the researcher uses the suicide rate as a proxy for happiness, b u t because suicide data involve less judgment ( b y the social

236

Research Decisions and Procedures

scientist; the coroner may use a l o t of j u d g m e n t ) , i t is less likely to be called a scale. Physical measuring instruments are seldom called scales i n social science, because they involve little judgment. I f the indicator contains several elements—for example, a happiness scale consisting of several related questions—each question is called an item. Each item may be thought of as an ordinary variable, and a scale can be com­ posed of t w o or more simple variables; for example, a baseball player's ability may be scaled based on one point for r u n n i n g ability, one p o i n t for t h r o w i n g ability, and t w o points for h i t t i n g strength. One of the i m p o r t a n t roles of scaling procedure is to determine whether t w o or more measurement scales are measuring the same dimension or are t a p p i n g different dimensions; one may use the Guttman-scale procedure ( b e y o n d our scope here) or factor analysis (also an advanced t o p i c ) for this purpose. This leads into multidimensional scaling, i n w h i c h responses to several dimensions are measured at once; for example, where a set of ques­ tions measure h o w favorable or unfavorable, how strong or weak, and h o w active or passive people judge a given stimulus. Scales to measure h u m a n responses come i n a b e w i l d e r i n g number of varieties. Sorting them according to some i m p o r t a n t characteristics helps b r i n g some order to the chaos. I shall discuss scales according to these characteristics, and i n this order: the type of mental activity the scale is intended to measure; whether y o u are interested i n variations w i t h i n i n d i ­ viduals or differences among individuals; the mathematical strength of the scale (discussed i n the previous c h a p t e r ) ; and whether the scale is simple or composite.

1. T y p e s of M e n t a l A c t i v i t y to B e M e a s u r e d L e t us distinguish these types of mental activity that one may wish to measure: h o w a person perceives a stimulus; w h a t a person knows or thinks about a factual state of affairs (cognition); h o w a person intends to behave; w h a t a person's values, preferences, and attitudes are; and the extent of a person's mental capacities.

a.

SCALES T H A T MEASURE PERCEPTION

The psychophysicists took the lead i n scaling h u m a n responses w h e n they began work on h o w different stimuH are perceived b y a person. For ex­ ample, h o w is a person's capacity to discriminate between t w o tones affected by the loudness of the sound? The scales that were invented i n the nine­ teenth century b y Weber, Fechner, and others are still i n use today for a variety of purposes. One basic scale of perception, Weber's, repeatedly presents t w o stimuli to the observer and measures the p r o p o r t i o n of the trials i n w h i c h the person can correctly identify the stimuli. For example, we may present a sound of a

Scaling Human Responses

237

given loudness to the subject and then present a series of sounds of other loudnesses ( a n d also the reference sound at appropriate i n t e r v a l s ) . After each test sound we ask the subject whether i t is louder or softer than the reference sound. Then for each test sound's loudness w e compute the pro­ portion of trials that the subject got right. The difference between reference and test sounds that is large enough to produce some given p r o p o r t i o n of correct responses—say, 75 per cent, because 50 per cent correct is expected purely b y chance—is called the "just noticeable difference." The proportions correct for the various loudness differences is a scale that measures accuracy of perception as a function of the differences. I n Fechner's procedure, people are asked to give a number that represents the relative strength of each test stimulus compared to the reference stim­ ulus, e.g., three times as bright, a fourth as bright, and so on. T h a t is, Fechner's procedure works w i t h the relative sizes of i n d i v i d u a l stimuli rather than w i t h the differences betw^een test stimuli and the reference stimulus, as i n Weber's procedure.

b.

SCALES T H A T MEASURE KNOWLEDGE

Scales to measure cognition ( k n o w l e d g e ) are straightforward: One asks the subject, " H o w many people w o u l d y o u guess live i n China?" The resulting data can be compared to the actual figure, and then the distribution of answers can be p l o t t e d among individuals. Tests that measure learning i n school are knowledge scales. There are many diflficult problems i n constructing satisfactory tests of learning, b u t because they are so special to education we shall not pursue them here.

C.

SCALES T H A T MEASURE INTENTIONS

Scales that measure intentions are of great importance to a l l research that aims to change people's behavior. A n example is appHed research into the effects of m a r k e t i n g variables, such as advertising and price, u p o n people's purchases; intentions to purchase consumer durables are used as a reason­ ably close proxy for actual future purchases, b o t h b y commercial firms, and by economic analysts w h e n forecasting the near future of the national economy. Sometimes simple questions about w h a t people i n t e n d to do—"Will y o u buy a car w i t h an airbag?"—can obtain accurate answers directly. I n other situations people cannot or w i l l not tell accurately w h a t they w i l l do. I f so, one may ( a ) use composite scales that contain several relevant questions w h i c h w o r k from various points of view, to study b o t h the pattern and the number of responses saying "yes" and "no" (more about composite ques­ tions l a t e r ) ; or ( b ) substitute other sorts of scales for the i n t e n t i o n scale; attitude and preference scales are frequent substitutes, w h i c h w i l l be dis­ cussed later.

238

Research Decisions and Procedures

cl. SCALES T H A T MEASURE INTEREST, ATTITUDES, A N D VALUES

M a n y kinds of research seek to measure whether people respond to particu­ lar objects or people negatively or positively. A n d scaling people's tenden­ cies is usually quite difficult because people have trouble i n expressing these tendencies, or are reluctant to do so. Prejudice research is an example; i t is hard for any of us to really k n o w our o w n attitudes t o w a r d other racial and rehgious groups, and we are often reluctant to tell other people about such attitudes. A t t i t u d e research is often used as a lazy man's substitute for intentions research, as when attitudes t o w a r d political candidates are used as a proxy for v o t i n g intentions, and w h e n attitudes t o w a r d or interest i n a particular product or b r a n d are used as a proxy for future purchasing behavior. A t t i ­ tude research at first seems easier to do than intentions research, and hence i t is vastly overused. Evaluative attitude scales usually w o r k on one of the f o l l o w i n g principles: After presenting the subject w i t h an object or person or concept for the subject's consideration, the subject may be asked: ( 1 ) to choose one or more among a set of words to describe the object; ( 2 ) to grade the object, numerically or graphically or w i t h i n a set of categories, as to h o w w e l l a given w o r d (e.g., "like," "pretty," "nasty," "sweet," "charming," "offensive") represents the subject's attitudes; or ( 3 ) to select where, between t w o opposites (e.g., "approve" and "disapprove"), the subject rates the object. The latter t w o scales are called "unipolar" and "bipolar," respectively.

e.

SCALES OF M E N T A L CAPACITY

1. Q. and other tests of mental capacity involve challenging problems of mea­ surement and scaHng. W h i c h test items indicate the ability y o u w i s h to measure? W h i c h set of items taken together makes u p a useful mental scale? I n w h i c h order should the items be given? W h i c h items stand for w h i c h abilities? These issues have led to a large and i m p o r t a n t body of scaling knowledge called psychometrics, w h i c h is too speciahzed to pursue here. 2. Stimulus a n d Response Scales Research can focus on one or another of the elements of the stimulus-re­ sponse relationship. a.

STIMULUS-ORIENTED SCALES

Psychophysical research focuses on how differences among graded stimuli aflFect the respondent. The experimenter may present a variety of sounds to a single subject to determine h o w differences i n stimulus intensity are per­ ceived. That is, i n stimulus-oriented scaling the researcher is interested i n diflFerences among responses by the same subject to variation among stimuli. Stimulus-oriented scaling is f o u n d i n fields other than psychophysics, too.

Scaling Human Responses

239

Ebbinghaus was interested i n the effects of variations i n nonsense-syllable presentations upon the learning of particular subjects ( and he used only one subject—himself ) . Or, a sociologist m i g h t be interested i n w h i c h aspects of blacks trigger prejudice b y representative whites. Because the stimulus is under the researcher's control, i t can be presented i n quantitatively-graded fashion, and quantitatively-graded responses can be obtained, e.g., the p r o p o r t i o n of trials i n w h i c h a subject correctly distin­ guishes a w e i g h t of 1.0 p o u n d from a w e i g h t of 1.05 pounds.

b . RESPONSE-ORIENTED SCALES

Some research aims to find h o w various responses to the same stimuli are distributed among a group of people. For example, w h a t p r o p o r t i o n of the p u b l i c wants the t o w n to float a b o n d issue to improve the schools? W h a t proportions w i t h i n the groups of Catholics, Protestants, a n d Jews believe that blacks are naturally superior to whites i n athletics? W h i c h segments of the p u b l i c w i l l be quickest to b u y a new health food? Is there a difference among racial groups i n ability to distinguish sound intensity? This is re­ search that studies similarities and differences among people and their re­ sponses, rather than similarities and differences among stimulus objects and situations.

3. Simple Composite Scales Scales can consist of a simple item—for example, asking about w h i c h tone is louder, w h i c h chocolate is preferred, i n w h i c h order one ranks racial groups by l i k i n g , and w h i c h candidates one intends to vote for. Simple single-item scales may not provide suflficient information, however. Therefore one may employ a m u l t i - i t e m composite scale. F r o m such a com­ posite scale one may obtain ( a ) more accuracy, ( b ) gradations of intensity, and ( c ) i n f o r m a t i o n on several dimensions of the stimulus or response. M u c h of our knowledge about scaling concerns h o w one may combine items into meaningful composite scales. For example, a one-item I . Q . test cannot give gradations finer than pass-fail. B u t a ten-item test could p e r m i t categorization into ten I.Q. groups, i f the test is constructed so that the items are perfectly ordered from easy to hard; that is, i f no person w o u l d answer a "hard" question correctly after missing an "easier" question. No such perfect I . Q . test scale can be developed, of course, b u t w i t h a hundred well-chosen questions one may obtain rather fine gradations. A m o n g the most i m p o r t a n t composite attitude scaling techniques are those of C u t t m a n , Thurstone, L i k e r t , and Osgood (the semantic differen­ t i a l ) . A l l seem complicated u p o n first acquaintance. B u t i t is their basic principles and not their complex details that are important. You should fight against becoming b e w i t c h e d b y their formal machinery. A complex com­ posite technique can produce useful knowledge only i f its basic p r i n c i p l e is sound for the particular situation.

240 a.

Research Decisions and Procedures

S U M M E D SCALES

The simplest type of composite scale presents several items to the respon­ dent and considers a sum of the responses to the items to be the scale score. L i k e r t scales, w h i c h present statements w i t h five possible responses from "strongly approve" to "strongly disapprove," are commonly used i n summed scales w i t h numbers attached ranging from, say, + 2 to - 2 . The score is the simple sum. Choosing items sensibly to make up a summed composite scale is critical, of course; the choice can be made w i t h unaided judgment, or a panel of expert judges can be used, or consistency checks can be made on pretest results of a variety of possible items. The G u t t m a n Scalogram is a method for investigating whether the items make up a cumulative scale; that is, whether the questions are arranged so that a person who, say, answers "no" to Question Four after answering "yes" to Question 3 w i l l also answer "no" to a l l questions after Question Four. The u n d e r l y i n g idea i n the scalogram is to arrange the items i n order of their "strengths" so that the step at w h i c h the respondent changes from one answer to the other is an index of the degree of the person s attitude. Sometimes the most effective—and always the simplest—way to handle a set of scale items is to examine w h i c h single i t e m best predicts the behavior you are interested i n , and then to w o r k w i t h this single i t e m alone. Factor analysis is a statistical device for finding several sets ("factors") of items for w h i c h there is high similarity i n p r e d i c t i o n among items within a composite factor, and l o w similarity i n p r e d i c t i o n among the factors. Some factor analysts have thought that factor analysis is a w a y of "uncovering fundamental entities" w i t h i n the h u m a n personahty, b u t this claim need not detain us now. For our purposes here, factor analysis is a "sophisticated" device for developing composite scales. Its m a i n danger arises from its "sophistication"; because of the complexity of the method, simple pro­ cedural errors are not obvious even w h e n gross, and may pass unnoticed. Another danger is that y o u may become so entranced w i t h the mathematical possibilities of factor analysis that y o u forget your original purpose.

4. C h o i c e of Scales There are numerous types of scales for the study of attitudes, opinions, beliefs, and other mental states. They range from ad hoc scales made up specially for single purposes to generalized instruments like the semantic differential, w h i c h is a system of many interrelated scales of this sort: Good

Bad

These interrelated scales have been tested i n many types of work, and many of its general properties are k n o w n . W h i c h scale, containing w h i c h categories, is best depends w h o l l y on the

241

Scaling Human Responses

purpose of the study and the general situation. Consider, for example, the f o l l o w i n g three scales, all of w h i c h were compared (along w i t h the scale shown on page 231) to see w h i c h w o u l d best relate to women's actual purchasing behavior. Typically, J. Abrams found that each of the four scales d i d best i n some respect. Listed below are several brands of each of two household products. For EACH brand place an "X" in the one box which best indicates how much you dislike or like that brand. The more you dislike it, the bigger the minus number you should give it. The more you like it, the bigger the plus number you should give it. I f you neither dislike nor like it, place your " X " in the 0 (zero) box. There are no right or wrong answers. Only your opinion counts. Definitely Dislike -5 ~4 -3

Products Toothpaste Brand A Brand B Brand C

-

-2

-1

_ _

_ _

Neutral 0 +1 _

_ _

_ _

_ _

_ _

+2

Definitely Like +3 +4 +5

_

_ _ _

_ _ _

_ _ _

Scouring cleanser Brand D Brand E Brand F Listed below are several brands of each of two household products. For EACH brand place an " X " in the one box which best indicates how much you dislike or like that brand. The more you dislike it, the smaller the number you should give it. There are no right or wrong answers. Only your opinion counts. Definitely Dislike

Definitely Like 2

1

3

4

5

6

7

8

9

10

Listed below are several brands of each of two household products. For EACH brand place an " X " in the box that best describes your opinion of that brand. As you'll note, the box on the left represents an unfavorable opinion. The boxes toward the right represent the more favorable opinions. Each box is described for you to help you in expressing your opinion. There are no right or wrong answers. Only your opinion counts. Below Average

About Average

A Little Better

A Lot Better

One of the Best

None Better

(Abrams, pp. 192-193)

To bewilder y o u even more. Figure 16.1 provides three other examples of scales that one m i g h t use for the scouring cleanser ( C r e e n and T u l l , p . 180):

242

Research Decisions and

FIGURE

Procedures

16.1

(b) 7 6 5 4 3 2 1

Very Gentle S o m e w h a t Gentle Slightly Gentle Neither Gentle nor Harsh Slightly Harsh S o m e w h a t Harsh Very Harsh

. . . . . .

I definitely agree with the statement I generally agree with the statement I moderately agree with the statement I moderately disagree with the statement I generally disagree with the statement I definitely disagree with the statement

(c) O

> 1-

O QQ

si 13

1 2 1 1 1 0 9 8

C

O

03

7

6

5

4

3

2

1

Source: Green, Tull, RESEARCH FOR MARKETING DECISIONS, 3rd. ed., © 1975, p. 180. Adapted by permission of Prentice-Hall, Inc., Englewood Cliffs, New Jersey. a.

SCALE INTERVALS

T h e intervals on a scale may simply be numbers from 0 to 1, or —5 to + 5 , or 1 to 10, or any other p a i r of numbers that serve as poles. T h e

respondent

t h e n indicates h o w far he is f r o m each pole. Poles may be "approve" and "disapprove,"

"happy"

arid

"unhappy,"

or

whatever.

Within

the

poles

n u m e r i c a l intervals can be marked, or there may simply be a line o n w h i c h the respondent makes a m a r k . T h e intervals o n the scale also may be specific categories that seem to f o r m a c o n t i n u u m . A n example of a "specific category scale" used to rate w h i t e residents of a housing project on t h e i r attitudes t o w a r d blacks was: Respect Felt for Blacks in the Project {Place

Thinks highly of blacks in project without qualification 1

check

at appropriate

3

on line, or circle

Is ambivalent; partly respects, partly feels they are inferior

Generally respects blacks living in project 2

position

4

5

X or Y.)

Strongly feels they are inferior

Generally feels they are inferior 6

7

8

9

X: Is indifferent to blacks as a group; doesn't think about them. Y: Doesn't think of blacks as group; considers them as individuals. (Selltiz ef a/., p. 404.)

Scaling Human Responses

243

I t is desirable that the items i n such a specific-category scale be arranged i n a reasonable order, and that they measure the same dimension. Various technical methods can help attain those goals; for more information consult a technical w o r k on scaling such as Torgerson, Edwards, or Summers. One may also t r y to select categories between w h i c h there are equalappearing intervals. This may be approached w i t h the aid of judges and a procedure invented by Thurstone (Thurstone and Chave; Selltiz, et al., p. 414). Ideally one chooses scales for attitude and opinion surveys b y testing alternative scales against actual behavior, as Abrams d i d w i t h r a t i n g scales for household products. I n many cases, however, independent testing is impossible, and the researcher must fall back on w i s d o m and artistry and the advice he can get from other experienced researchers. But—relax. D o n t let the huge number of possible scaling methods paralyze y o u w i t h fear. The results y o u get are not likely to be very sensitive to w h i c h scale y o u use. A n d i f y o u t r y t w o rather different scales and the results agree, y o u can feel reassured that you are not likely to be badly i n error. Sometimes the f u l l power of sophisticated complex scaling techniques is necessary and helpful. B u t such situations are not likely to confront you i n your early research work. 5. S u m m a r y Scaling is the t e r m applied to the measurement of human responses to stimuli. M a n y ingenious scaling methods have been invented to evaluate the individual's response to a variety of related stimuli (psychophysical scaling) and to evaluate the range of responses among a group of people to particular stimuli ( a t t i t u d e measurement). The choice of the appropriate scaling de­ vices requires artistry, good sense, and knowledge of the u n d e r l y i n g reahty. Composite scaling devices can help y o u squeeze information out of data. But the more clear-cut the p r o b l e m and the better your data, the simpler the techniques y o u require. A n d i f the data collection was poor i n meaning and execution, no amount of fancy techniques w i l l avail.

ADDITIONAL

READING

FOR CHAPTER

16

A n e x c e l l e n t g e n e r a l d i s c u s s i o n o f s c a l i n g is in S e l l t i z et al. ( C h a p t e r 1 2 ) . E d w a r d s ' s h o r t b o o k is a n e x c e l l e n t t e x t o n t h e s p e c i a l t o p i c of a t t i t u d e s c a l ­ ing. S u m m e r s also provides m u c h useful information about attitude scales. T o r g e r s o n is t h e c l a s s i c t r e a t m e n t of s c a l i n g f r o m a p s y c h o m e t r i c p o i n t of view, though the book also Includes some information on response scales. O n s c a l i n g p r o c e d u r e s in m a r k e t r e s e a r c h , w i t h s p e c i a l a t t e n t i o n g i v e n t o p s y c h o m e t r i c techniques, see Green a n d Tull, 3rd e d . (Chapter 6).

17 d a t a hondling, acyusting, a n d summarizing 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Data Collection and How to Avoid Disaster Errors in Automatic Data Processing Adjusting the Data Estimating Missing Data Imputation Imputing Value in the Absence of Standards Standardizing the Data Index Numbers Avoiding the Hazards of Hired Help Summary

L e t us pick u p the task at the completion of pretesting. Your study design is all set—at least u n t i l further modification. N o w y o u must actually collect the data, w o r k t h e m into shape for interpretation, and analyze i t . This chapter discusses the process from data collection u n t i l y o u begin the analysis.

1. D a t a C o l l e c t i o n a n d H o w to A v o i d D i s a s t e r No part of the process of data collection and data h a n d l i n g is safe from h u m a n error. M a n y of the systematic sources of error from observers w i l l be discussed i n Chapter 19. B u t just p l a i n mistakes can creep i n anywhere, like the n i g h t air. For example, mistakes are made i n w r i t i n g d o w n the data, i n transferring t h e m from data sheets to punched cards or tape ( h u m a n errors though; the machines are nearly fool-proof), labeHng the results, and so on. Unsound decisions about classification and measurement are another major source of error i n the data-collection process. D e c i d i n g on the excep­ tional cases and other t r i v i a of the data-collection process is i r r i t a t i n g and demanding w o r k . These decisions about exceptions—many of w h i c h occur i n the coding of questionnaires, for example—are the soft underbelly of any research project. One of the weakest links i n social science is that, w h e n someone reads a

Data Handling, Adjusting, and Summarizing

245

published report of a piece of research, he cannot k n o w h o w wisely or unwisely these decisions have been made. Yet a study can arrive at quite incorrect results i f they are made poorly. The only remedies for the h u m a n errors i n the data-collection process are eternal vigilance, spot rechecking, checking the data against your i n t u i t i o n , a n d more checking. 2. E r r o r s i n A u t o m a t i c D a t a Processing You should have at least an onlooker's understanding of w h a t computers can do. You should understand h o w a card or tape serves as a repository of information and h o w to get information onto and off i t . W h e t h e r you should p r o g r a m your o w n w o r k is another question. I t may be most efficient to use the services of people w h o make their livings w o r k i n g w i t h computers. As computers have become more complex, however, they have also become easier to use; y o u can now learn a l l the necessary skills of basic program­ m i n g for social-science research i n just a few days. W h e t h e r y o u do a l l the w o r k yourself or someone else does i t for you, here are some specific pieces of advice for the data-handhng process ( skip the first four u n t i l you have some acquaintance w i t h data-processing pro­ cedures and equipment ) . First, "clean" ( c h e c k ) the data thoroughly. You should clean to see whether any data appear i n columns that should be blank on all cards, i n w h i c h case y o u have probably spotted a card on w h i c h a l l the columns have been shifted to the r i g h t or left. Clean to see that there are no punches other than the ones that your code allows for, column b y column. A n d so on. Second, spot-check your data b o t h at the r a w stage and at the card stage to see i f there are glaring errors. A good w a y to check is to " p r i n t out" all the data and r u n your eye d o w n the various columns. T h i r d , check a random sample of the data for m i n o r errors. The number of errors y o u find w i l l enable y o u to estimate h o w many errors there are i n the data as a whole. This is a type of quality-control check. F o u r t h , at each stage at w h i c h y o u m u l t i p l y , divide, or otherwise trans­ form your data, p r i n t out the results of the transformations and check each calculation. O n l y experienced researchers can appreciate h o w many ways there are for transformations to go wrong, mostly t h r o u g h w r o n g instruc­ tions to the programmer or w r o n g p r o g r a m m i n g . The former director of one of the i m p o r t a n t computer laboratories swears that the amount of error that takes place i n data processing is enormous, dangerous, and mostly u n k n o w n to anyone even after the results appear i n print.^ F i f t h , never t h r o w away any original data u n t i l five years after y o u are absolutely sure they have already been useless for ten years; that is, do not ever t h r o w away the data. Especially do not t h r o w away original data i f y o u have transformed them into a more refined form. Transformations ( f o r 1. A. Hoggatt, in conversation.

246

Research Decisions and Procedures

example, averaging) are often not reversible; sometimes y o u cannot get back to your original data from the transformed data. Sixth, i f data on other dimensions that just might be relevant sometime are available i n the data-collection process at only slight extra cost, collect them ( b u t don't go hog w i l d and swamp yourself w i t h useless d a t a ) . Seventh, i f your data have idiosyncrasies because of the particular nature of your study, t r y to collect them i n such a way that the idiosyncrasies w i l l not hamstring their use i n some future study. After your first close-up experience w i t h the actual workings of a research project y o u may be horrified at the inaccuracies and possible errors i n the process. This is especially likely to happen after y o u conduct your first personal i n t e r v i e w i n a lower-class home, where y o u quickly find out h o w m u c h judgment is required of the interviewer, especially w h e n the inter­ viewee does not seem to understand w h a t y o u are talking about. The coding process can also frighten one into t h i n k i n g that all research must be w o r t h ­ less. This reaction is similar to the one y o u experience w h e n y o u first go into the kitchen of a first-class restaurant: You swear never to eat out again. Still, the number of food poisonings is smaller than the appearances lead y o u to expect. ( B u t there is no more excuse for a sloppy research project than for a restaurant's filthy kitchen—and b o t h can be dangerous. )

3. A d j u s t i n g the D a t a Experiments usually produce nice, clean, orderly data, because w h a t comes out is a product of w h a t is p u t i n and the researcher has considerable control of w h a t goes i n . Sometimes the experimenter disregards some ob­ servations because they are "obviously" i n error—perhaps because an earth­ quake h i t at the moment the observation was made—but, aside from such matters, l i t t l e adjustment is usually done w i t h experimental data. I n non-experimental research the situation is different. The researcher must take w h a t he can get and then patch i t up and adjust i t . This seems scholarly betrayal to researchers w h o do mostly experimentation. Yet the adjustment must be done and done w e l l i f non-experimental research is to get anywhere at all. Here is an example of adjustments made to the sample i n the Survey of Consumer Finances i n order to render i t suitable for eco­ nomic analysis : The sample included for analysis. I n keeping with our previous comments about the difficulty both with the data and with their interpretation for spending units involved i n some entrepreneurial activity, we decided to exclude from the sample farmers and those who owned a business. I n any survey, there are some interviews where a relevant item of information is not obtained, e.g., income, assets, or some component of saving. When, as in the Surveys of Consumer Finances, it is intended to publish data and relations pertaining to the nation as a whole, it is best to assign values to these cases by

Data Handling, Adjusting, and Summarizing

247

matching these spending units with others in the sample who are like them in as many respects as possible. The alternative would be to eliminate these interviews entirely, throwing away all the other information, and implicitly assigning them the mean sample values of all the variables (by reweighting the other inter­ views). However, for our purposes, where we are interested in detailed patterns of relationships and levels of significance, it seemed better to eliminate all these cases and any possibility of spurious relations resulting from the assignment procedure. Hence, we removed from the samples all cases where the amount of income, assets, or saving had been assigned. For the relevant tables we also eliminate the cases where such things as income change, occupation, etc., were not ascertained. Finally, we made two other types of exclusion: First, we excluded a few cases where there seemed good reason to suspect the reliability of the information. For instance, we took out cases where saving was so large as to indicate a level of expenditure out of proportion with the general economic position of the spending unit. The latter cases usually involved some large transactions where all the money did not seem to be accounted for. Second, there were some cases where extreme saving behavior was clearly present, but it was so extreme and so un­ usual as to require a separate treatment. (Morgan, pp. 98-99) M . F r i e d m a n emphasizes the importance of wise adjustment of the data: I n seeking to make a science as "objective" as possil^le, our aim should be to formulate the rules explicitly in so far as possible and continually to widen the range of phenomena for which it is possible to do so. But, no matter how suc­ cessful we may be in the attempt, there inevitably will remain room for judgment in applying the rules. Each occurrence has some features peculiarly its own, not covered by the explicit rules. The capacity to judge that these are or are not to be disregarded, that they should or should not affect what obsei'vable phenomena are to be identified with what entities in the model, is something that cannot be taught; it can be leamed but only by experience and exposure in the "right" scientific atmosphere, not by rote. I t is at this point that the "amateur" is separated from the "professional" in all sciences and that the thin line is drawn which distinguishes the "crackpot" from the scientist. (Friedman, p. 25) A n d one more quotation (this time by a chemist) about this very sensitive issue, w h i c h illustrates once again that m u c h of science is art and judgment: THE REJECTION

O F OBSERVATIONS

One of the most difficult decisions which an experimenter has to make is whether or not to reject a result which seems unreasonably discordant. Such results occur, because of accidents or mistakes (Sec. 9.1), with greater or less frequency depending on the skill and care of the investigator. I f a mistake of addition is made, it is certainly not reasonable to expect the number obtained to tell very much about the quantity being observed. In particular the use of statis­ tical techniques based on the normal law of error is hardly justified. Sometimes the reason for a " w i l d " result is obvious. The operator may know what he did that caused it. These clear-cut cases are not important—if the

248

Research Decisions and Procedures

operator makes a mistake which he knows will spoil the results, he should stop the measurement at that point, or at the least mark it irrevocably for discarding, even if i t eventually should come out close to expectations. On the other hand, the search for specific reasons for rejecting a result after the fact is highly dangerous. I t is too easy to find an excuse and too likely that prejudice and emotion will enter. However, if the observer is willing—and strong-minded enough—rejections may be made by the rule that, if a given cir­ cumstance is once used to justify discarding a discordant result, the occurrence of the same circumstance must cause rejection whenever it happens and what­ ever the result. It is important to look for causes of unusual results; many great discoveries have been made in this way. . . . There is clearly no magic formula for the rejection of observations. Considera­ tion of the difficulties involved is worthwhile if it leads to the attitude that it is important to be very careful to prevent mistakes and accidents. W i t h experiments requiring extremely good conditions of observation or ex­ tremely high sensitivity on the part of the operator, there is often a desire to disregard negative results on the grounds that conditions were not right or that the operator was not in the right mood. This is undoubtedly responsible for much pseudo science, psychic phenomena, and similar material. I f negative results are excluded from a chance sequence, a positive average will naturally result. I t is easy to dismiss these cases as mere charlatanism or self-deception, but many educated people do accept such nonsense. (Wilson, pp. 256-257) One reason that adjustment of the data is so i m p o r t a n t i n nonexperimental w o r k is that the data may be l i m i t e d . I n physiology, chemistry, or experimental psychology, the researcher can repeat the experiment and ob­ tain new data to check whether an observation really is a "sport" ( a n error of some sort) and should be t h r o w n out of the study. B u t one cannot simply t h r o w out one year's observation of Gross National Product. (Yet, that is just w h a t is often done w i t h war years!) Definitional difficulties often lie at the root of discrepancies i n the raw data. For example, look at the entries for the p o p u l a t i o n of China i n the years 1578, 1661, and 1749 i n the "Returned i n Census" column i n Table 17.1. A . Usher explains these peculiarities to us: . . . The returns were used as a basis for taxation, and the unit of enumeration has commonly been presumed to be the family. But some enumeration of in­ dividuals (mouths) was made as part of the return, and at some periods the return of "mouths" is held to be more trustworthy than that of families. The bias likely to develop from the system of taxation is minimized by the return of both taxable and nontaxable persons. The periods of greatest difficulty are the Sung and early Tsing dynasties. I n the Sung period, the basis of enumeration of "mouths" was changed, and males alone were included. I n the early reigns of the Tsing Dynasty only males of military age (16-60) were recorded. When adjust­ ments have been made for these divergences of practice, the series of returns becomes essentially self-consistent and conforms to the expectations created by the history of the country. (See "Probable Total" column in Table 17.1; Usher, p. 15)

Data Handling, Adjusting, and Summarizing TABLE 17.1

249

Population of China (in millions)

Year

Dynasty

Returned in Census

2 156 606 733 754 1080 1260 1290 1381 1393 1491 1578 1661 1690 1710 1749 1780 1812 1842 1860 1885 1923

Han Han Tang Tang Tang Sung Yuen Yuen Ming Ming Ming Ming Tsing Tsing Tsing Tsing Tsing Tsing Tsing Tsing Tsing Tsing

59.5 50.0 46.0 45.4 52.9 33.3 53.6 58.8 60.5 58.6 56.0 63.5 21.0 20.3 23.3 177.4 276.6 360.4 413.0 260.0 377.6 414.0

Probable 1 71.0 60.0 55.0 54.0 63.5 79.0 65.0 70.5 72.5 70.1 66.0 75.0 105.0 101.5 116.5 177.4 276.6 360.4 413.0

? 377.6 414.0

S O U R C E : " T h e History of Population and Settlement in Eurasia," p. 123, by A. P. Usher, in TA?e Geographical Review, V o l . 20 (January 1930).

Sometimes, however, one cannot discover any reasonable explanation for discrepancies i n the r a w data. This may be because the data sources are no more than someone's guesses, w h i c h are reprinted and passed along as gospel t r u t h . W h a t w o u l d y o u make of the population history of Japan from A . D . 589 to A . D . 1702, from the data i n Table 17.2? Sometimes y o u may adjust a set of data by loeighting some data more heavily than others. F o r example, i f y o u have yearly data on the incidence of mental disease and y o u w a n t to investigate the relationship between mental disease and reUgion, y o u m i g h t w e l l pay more attention to the data of the last t h i r t y years than to those from more than t h i r t y years ago, because recent statistics on the incidence of mental illness are more accurate than those from further back. M a n y scientists are suspicious of such w e i g h t i n g , on the grounds that i t is subjective, and properly so, because the method and results of the study cannot be so easily checked b y someone else. B u t some such subjective judgments are inevitably part of every study; the researcher has to designate some year as the first year she w i l l include i n this study, and usually she does so according to the adequacy of the data. The danger, of course, is that the researcher w i l l choose a cut-off date (or w e i g h t i n g scheme) that w i l l prove the hypothesis she wants to prove. There

250

Research Decisions and

really is not m u c h defense

Procedures against this f u d g i n g danger except m a k i n g an

e x p l i c i t statement of w h y y o u d i d w h a t y o u d i d .

TABLE 17.2 Estimated Populations from the Early Times to the Middle of the Tokugawa Era Date A.D. 589

610

"

650-11502 710-483 721 724-48

736 823 859-9224 901-225 923 986-10106 990-10807 c. 1155 1185-13338 1278-879 1528 1553 1562 1573-9110

1673-8311 1688-170312 1702

Population 3,931,151 4,031,050 4,988,842 4,990,000 4,969.699* 8,833,290* 5-6 millions* 4,584,893 2 millions 4,508,551 4,899,620 8 millions 6,631,074 8,631,770* 3,694,331* 3,762,000* 2 millions* 1,128,167* 22,083,325* 4,416,650* 24-25 millions* 9,750,000* 4,984,828 4,916,652 2,330,996* 4,994,808 18 millions* 24 millions 26 millions* 24,994,600*

Authorities Shotoku Taishi Denki Taishi Denki Tals hi den Jugan-iko; Nishikawa-Korinsai, Nihon Jural Suzuki, Kofutaii M. Kimura G. Sawada Gyoki Bosatsu Gyoioki Gyoki Shikimoku Ibid., Differently quoted Ibid., Differently quoted Jugen-Iko and Nishikawa, op. cit. Arai-Hakuseki, Oritakushiba no Kl Ishihara Y. Yokoyama, Nihon Denseishi Ibid. Hidenori Ino Chirikyoku Zasshi Hidenori Ino Yokoyama, op. cit. Hidenori Ino Kongyoku Satsuyoshu and Ruiju Bishop Shuntei, Zakkishu Chirikyoku Zasshi Katori Bunsho T. Yoshida, Ishinshi Nachiko Nishikawa, op. cit. Yoshida, op. cit. Chirikyoku Zasshi

Suidoko

Meibutsuko

* Indicates estimates by present-day writers. SOURCE: Ishii, p. 4

A n o t h e r fix-it o p e r a t i o n is collecting more data to beef u p an area w h e r e y o u r data are too few. F o r example, i n a study l i k e A . Kinsey's y o u m i g h t find u p o n first inspection of the data i n h a n d t h a t too f e w m e n aged

fifty-five

to sixty-five, R o m a n CathoHc, a n d members of the m i d d l e class have fallen i n t o y o u r sample, perhaps because there are very f e w people of this sort. I f y o u r sample is r a n d o m l y selected, y o u can go out a n d o b t a i n data on more people t h a t meet these specifications; this move is p a r t i c u l a r l y feasible i f y o u

Data Handling, Adjusting, and Summarizing

251

are taking your sample i n several sequential steps. ( B u t y o u must remember to take account of these resamples w h e n you make estimates for the u n i ­ verse as a whole; see Chapter 9 on stratified sampling for an explanation.) A last step i n the adjustment process is to examine the data to see i f the results are reasonable. I f the results are not reasonable, there may be an error. Novices laugh w h e n they hear economists or psychologists say, " I f the data don't agree w i t h the theory, check the data." But, indeed, the data are often found to be i n error w h e n they disagree w i t h the hypothesis. The stronger the theory, of course, the more likely y o u are to p u t your money on i t w h e n i t quarrels w i t h the data. B u t w a t c h out—danger ahead! I f y o u recheck your data w h e n they do not agree w i t h your hypothesis and do not recheck the data w h e n they do agree w i t h the hypothesis, y o u are l o a d i n g the dice i n favor of getting data that agree w i t h your hypothesis. The only solution is always to recheck your data.

4. E s t i m a t i n g M i s s i n g D a t a Sometimes y o u need a d a t u m that is not available. Y o u then have t w o choices: A b a n d o n the study, or estimate the datum. A n d i f y o u cannot estimate the d a t u m or i f the d a t u m represents too large a part of all the data i n the study, then you must abandon the study. There are many devices for estimation, all of them i n v o l v i n g mostly com­ m o n sense. I n Chapter 21 we w i l l find the example of estimating unavailable advertising expenditures for a given year b y assuming that the ratio of 1976 expenditures to 1975 expenditures w o u l d be the same as the ratio of 1976 Cross N a t i o n a l Product to 1975 Gross National Product. Another example of the same strategy is W i l l i a m Potty's 1687 estimate of the population of L o n d o n , w h i c h took advantage of the fact that the authorities h a d b u r n e d the house of each person w h o died of the plague of 1666: The number of houses which were burnt Anno 1666, which by authentic report was 13,200; next, what proportion the people who died out of those houses bore to the whole, which I find . . . Anno 1666 to be almost 1/5, from whence I infer the whole housing of London Anno 1666 to have been 66 thousand; then finding the burials Anno 1666 to be to those of 1686 as 3 to 4, I pitch upon 88 thousand to be the number of housing Anno 1686. (Letwin, p. 143) Interpolation and extrapolation are similar devices. I f y o u have data for 1973 and 1975 b u t not for 1974, i t may be reasonable to split the difference and interpolate for 1974. ( B u t , i f you have the average temperature for Chicago i n M a y and October, do not estimate the average temperature for August b y interpolation.) I f y o u have the data for 1974, 1975, and 1976, i t may be reasonable to extrapolate the t r e n d into 1977. ( B u t do not extrapo­ late the temperature trend for June, July, and August into September.) Another trick is to disaggregate. I t is often easier to estimate for i n d i v i d -

252

Research Decisions and Procedures

u a l parts of a w h o l e and then to combine t h e m for a better overall estimate than to estimate straight-off. For example, perhaps I w a n t to estimate the gross revenue of a particular lunch counter for a year. I have no basis for a direct estimate. B u t i f I sit at the lunch counter for an hour I can count the number of customers i n that period of time, estimate the average b i l l per customer, m u l t i p l y b y the number of hours a day the counter is open ( a d ­ justing for busy and slow h o u r s ) , m u l t i p l y b y the number of stools i n the l u n c h counter, adjust for the time the stool is not i n use, and arrive at an estimate. As a matter of fact, I could even estimate the revenue from a single stool w i t h o u t ever going into the restaurant. This k i n d of disaggre­ gated estimate is a mental sampling process, and i t is likely to be closer to the actual figure than i f I made a direct estimate for the entire l u n c h counter. Another m e t h o d of estimation w o u l d be to w a t c h the cash register for ten minutes, total the receipts, then expand for the whole year. Another illustration: H o w tall is the E m p i r e State B u i l d i n g or the tallest b u i l d i n g i n your town? You k n o w that each floor is eight to eleven feet high, and y o u can estimate the number of floors. Presto—an estimate. Valuation estimates can often be i m p r o v e d b y disaggregating. For ex­ ample, w h e n p l a n n i n g a week's riverboat-cruise vacation, m y wife and I discussed whether to pay $40 extra for a private shower. The valuation seemed to make sense only after we d i v i d e d b y the number of showers w e expected to take and estimated that each private shower w o u l d cost between $3 and $4. W e could then say to ourselves, " W o u l d I pay $3 to $4 extra to take a shower i n the room rather than w a l k a few doors d o w n the corridor?" Sometimes i t is difficult to get several people to agree about w h a t is the most sensible estimate. A . Enthoven finds that, i n government poHcy re­ search, offering three estimates—high, l o w , and middle—can often b r i n g agreement: " I t is surprising h o w often reasonable men studying the same evidence can agree on three numbers where they cannot agree on one. I n fact, one of the great benefits of this approach has been to ehminate m u c h senseless q u i b b l i n g over minor variations i n numerical estimates of very uncertain magnitudes" ( E n t h o v e n , p . 4 2 1 ) .

5, I m p u t a t i o n Chapter 15 referred to the problems i n national-income accounting of reckoning the labor value of housewives. This and related problems must be solved b y imputation, the process of estimating a q u a n t i t y w h e n there is little or no scientific rationale for the estimate. O. Morgenstern calls imputa­ t i o n ( o f value) "perhaps the classical p r o b l e m of economic theory" ( p . 2 4 5 ) . I m p u t a t i o n is also an i m p o r t a n t p r o b l e m i n other social sciences. I shall discuss three diflficult kinds of i m p u t a t i o n problems here: d i v i d i n g com­ bined data into parts, allocating parts of a whole, and evaluating w o r t h i n the absence of a standard.

Data Handling, Adjusting, and Summarizing

253

a. SEPARATING COMBINED DATA I w a n t e d to determine the differences i n state tax revenues from distilled spirits i n the thirty-four states w i t h private-enterprise liquor stores, com­ pared to the sixteen states i n w h i c h liquor stores are r u n by the states and "socialized" (J. Simon, 1966b). B u t i n some states, like H a w a i i , the data for beer and w i n e are l u m p e d together w i t h the data for distilled spirits, and I w a n t e d the revenues for liquor only. Table 17.3 illustrates h o w the data are l u m p e d together (as w e l l as other complexities that an investigator must unravel i f she wants to use such accounting data for scientific purposes). T A B L E 17.3

Liquor-Tax Revenues in Hav lii, 1961

Permit fees 38 liquor-tax permits at $2.50 (rounded) Taxes Wholesale Retail Taxable use

$ $3,508,524.00 6,234.00 14,742.00

$3,529,518.00 39.00

Total liquor-tax collections Miscellaneous income Total state collections from alcoholic Costs of collections (estimated) Net state revenues Local collections (license fees) City and County of Honolulu County of Maui County of Hawaii County of Kauai Total local collections Costs of local collections {estimated) Net local collections Total net state and local revenues

95.00

leverages

$3,529,652.00 7,259.00 $3,522,393.00

Number 649 134 176 83

Amount $ 339,619.00 51,058.00 64,379.00 30,174.00 $ 485,230.00 24.900.00 $ 460,330.00 $3,982,723.00

I n this case, as i n many similar cases, the solution was to refer to states i n w h i c h the revenues are separate and to assume that the proportions w o u l d be the same i n the states that do not reckon the taxes separately.

b. ALLOCATING PARTS OF A W^HOLE This type of i m p u t a t i o n is t r u l y separating the inseparable. For example, how w o u l d y o u decide h o w m u c h the husband is responsible for the popu­ larity of a given couple and h o w m u c h the wife is? One tactic w o u l d be to investigate the p o p u l a r i t y of the husband and the wife i n the social relations that they conduct separately. B u t they may go everywhere together. Fur­ thermore, their popularit)^ or lack of i t may stem from h o w they behave as a couple, w h i c h may be quite different from h o w they behave separately; that is, there may be an interaction effect. I n the case of this m a r r i e d couple, i t may be meaningless to impute responsibihty for p o p u l a r i t y to the husband or the wife. I n such cases, y o u

254

Research Decisions and Procedures

should refuse to t r y to separate the mseparable. Indeed, i f y o u are studying married couples qua couples, there is probably no reason even to t r y to establish separate responsibility for the p o p u l a r i t y of the couple. I n some other cases, however, the problem simply cannot be ducked. The allocation of overhead costs i n the U.S. Post Office offers a classic case. Congress sets postal rates for different classes of m a i l and takes into account the costs of handhng the various classes of mail. B u t h o w m u c h of the cost of constructing and m a i n t a i n i n g the post-office b u i l d i n g should be charged to first-class mail, h o w m u c h to second-class m a i l , and so on? One answer is obviously wrong. I t w o u l d make no sense to charge each class of m a i l as m u c h as i t w o u l d cost i f there were no other classes of m a i l . I f so, the sum of the allocations w o u l d be m u c h larger than the actual total cost. N o r is there any particular justification for allocating the costs i n pro­ portion to the number of pieces of each class of m a i l or to the other costs for each class of mail—although these are the answers that accountants w o u l d p r o b a b l y offer. Because there is no agreed-upon or scientific rationale for apportioning these costs, the arguments rage, and the pressure groups scream. The answer is at best a matter of philosophy, judgment, and determination of ultimate purpose. T o indicate just one possible line of reasoning, i f w e assume that the U n i t e d States must have first-class m a i l and i f the other classes of m a i l are merely riders on the coattails of first-class mail, then i t m i g h t be sensible to charge to first-class m a i l all the overhead costs for facilities that are used jointly b y all the classes of m a i l . O n the other hand, i f the object were for the government to make as m u c h profit as possible, the prices w o u l d be set purely i n terms of w h a t the customers w o u l d pay, and no allocation of joint overhead costs w o u l d be necessary. I f the object is to allocate the costs "most fairly," then the government w o u l d probably have to have a national referendum.

6. I m p u t i n g V a l u e i n the A b s e n c e of Standards The effectiveness of alternative university curricula is notoriously hard to evaluate because there are no agreed-upon standards against w h i c h one can rate the graduates of the alternative curricula. You m i g h t propose such a measure as "starting salary at graduation" for comparison of graduates of a great-books c u r r i c u l u m and graduates of a free-electives c u r r i c u l u m . B u t neither this nor any other quick and simple test seems to have any general vahdity. A similar p r o b l e m arises w h e n a university tries to p u t a value on a piece of its o w n land—something i t must do i f i t is to make wise decisions about where to place w h i c h of its buildings. L a n d for a proposed library site i n the m i d d l e of the campus obviously has great value, and that value is an impor­ tant component of the total cost of the b u i l d i n g . B u t w h a t value should be imputed? Only one t h i n g is sure: T o i m p u t e zero value for the l a n d because you have no firm basis for estimation is folly, though this exact course m i g h t

Data Handling, Adjusting, and Sunrimarizing

255

be suggested by an old-fashioned accountant. (Yet some b i g companies like U n i o n Carbide still do carry "good will"—the value of their organization and trade name—on their accounting books at $1!) W h e n there is no empirical source for data b u t the data are crucially needed, y o u must somehow make the most sensible estimate. Base your estimate on any evidence y o u can adduce—for example, the cost of similar land as estimated b y real estate brokers, i f the university had to go out and buy. The most sensible estimate is a delicate piece of judgment i n w h i c h the skills of the scientist and the knowledge of experienced people ( i n this case, the economist and the real estate b r o k e r ) must be combined. Lately there has arisen a body of logical theory—called "Bayesian statistical decision theory"—that provides guidelines for systematically combining such esti­ mates w i t h your other data. I m p u t e d values can only be tested by judgment. There is no scientific test of the fairness and soundness of an i m p u t e d value, and indeed a jury is often required to render a j u d g m e n t : LAND PmCE

S E TB Y J U R Y

A circuit Court jury late Tuesday night returned a verdict of $64,500 for land required by the University of Illinois for the $14.3 million Center for the Per­ forming Arts. The verdict is the second in the series of condemnation suits by the university to acquire the site bounded by Goodwin Avenue, Gregory Place, Illinois Street, and Oregon Street. The property involved in the latest suit is owned by Albert and Beatrice Gregerson. The University had appraised the property at between $52,000 and $55,000, and the defendants at between $74,000 and $78,000. (Champaign-Urbana Courier, March 25, 1965, p. 3) The importance of sound i m p u t a t i o n is seen i n the magnitude of the decisions that are affected. I m p u t e d value judgments of alternative school curricula, not any empirical scientific tests, are the basis of France's nation­ w i d e school c u r r i c u l u m decisions. A n d the i m p u t e d "shadow prices" that government agencies pay each other for goods influence the entire makeup of the Soviet economy. ( H o w does the U.S.S.R. decide w h a t price for t r u c k engines the engine factory should charge the truck factory? General Motors has this p r o b l e m too. One of the possible bases for the Soviet price is the amount the engines sell for i n free-world markets. The existence of auto­ matic self-regulating prices, rather than i m p u t e d prices, is a key difference between socialism and capitalism.) 7. S t a n d a r d i z i n g the D a t a Before y o u can compare the t w o or more events and before y o u can make estimates about totals, central values, and other measurements—in general.

256

Research Decisions and

Procedures

before y o u can use various pieces of data i n some study—you must make them comparable. You must standardize the data. Lack of comparabihty i n data is a fertile source of fallacies and lies based on statistical presentation. Often the fallacious i m p l i c a t i o n is not intended to deceive people, b u t people are deceived nevertheless. For example, the National Safety C o u n c i l reports that two-thirds of the people k i l l e d i n auto­ mobiles are traveling forty miles an hour or under. B u t does that figure tell you a n y t h i n g about whether i t is safer to travel under or over forty miles per hour? I t does not. I t w o u l d be m u c h more i l l u m i n a t i n g to k n o w the chances of being k i l l e d per mile driven at speeds under and over forty miles an hour. D . Huff gives us the story of the roadside merchant w h o , w h e n asked h o w he could sell his rabbit-meat sandwiches so cheaply, replied: " W e l l , I have to p u t i n some horse meat too. But I mix 'em fifty-fifty: one horse, one rabbit" (Huff, p. 114). Public-relations men are fond of the noncomparable comparison as a device to make a phony point. A liquor-trade association, for example, w a n t e d to make an argument for " W h y Georgia Cannot Afford to Socialize Its L i q u o r Industry." The brochure p u r p o r t e d to show that, i f Georgia were to adopt a system like that of V i r g i n i a or N o r t h Carolina, i t w o u l d then lose tax revenue. B u t the flack failed to p o i n t out that liquor prices charged b y the States of V i r g i n i a and N o r t h CaroMna were at that time far below the average retail prices i n Georgia. I f the State of Georgia shifted to a system like that of V i r g i n i a or N o r t h Carolina and merely charged the same prices that private retailers had been charging i n Georgia up to that time, the State of Georgia could increase its revenues enormously. Such a result w o u l d be shown i f we calculated the amount of state revenue per dollar p a i d for l i q u o r by consumers, a w a y of standardizing. T w o newspapers, one for and one against Roosevelt's N e w Deal, carried these t w o headlines and stories i n the same week, differing i n the mode of standardizing: ELECTRIC OUTPUT IN SEASONAL

RISE

Electric power production for the week ended May 28 rose substantially above the total of the previous week. . . . P O W E R O U T P U T O F F 10.6%

FROM

1937

FIGURES

The production of electricity in the United States in the week ended May 28 was 1,973,000,000 kilowatt hours, a drop of 10.6 per cent from the 1937 week according to the Edison Electric Institute. (Cohen, p. 659) Another income of per capita B u t how

example is the comparison of countries for wealth. The yearly I n d i a is greater than that of L u x e m b o u r g or H o l l a n d . B u t on a basis L u x e m b o u r g and H o l l a n d are wealthier than I n d i a . a set of data should be standardized is never obvious. L i k e every

Data Handling, Adjusting, and Summarizing

257

other decision i n empirical research, this decision must be made w i t h refer­ ence to the purpose of the research. Assume, for example, that y o u w a n t to compare the average male of t w o countries for physical strength, as mea­ sured b y h o w m u c h w e i g h t he can carry for a mile. Should you make a straightforward comparison, man for man, or should y o u calculate the amount of w e i g h t carried relative to the body weight of each man? Stan­ dardizing men b y their o w n w e i g h t is common practice i n many of the O l y m p i c games and i n collegiate wrestling, boxing, and judo ( b u t not i n r u n n i n g and s w i m m i n g ) . Or perhaps y o u should standardize b y the amount of food they eat or b y any of a dozen other types of standardizing measure­ ments. The decision must be made w i t h reference to the universe of interest that you are t r y i n g to investigate. I f y o u w a n t to k n o w h o w m u c h h u m a n male lifting poiver is available i n a country, for the purpose of evaluating its labor resources, y o u w o u l d not want to standardize at all. But, i f y o u w a n t to k n o w about the physical condition of the men i n the various countries and about their "prowess" as we usually t h i n k about i t , you w o u l d standardize by weight, age, and other dimensions. I n experimentation the control group is a standardizing device. The con­ t r o l group constitutes a base-Hne standard against w h i c h to compare the experimentally treated group. The key idea is that the experimental group, w h i c h is randomly chosen from the same universe as is the control group, w o u l d show the same results as the control group if it had not been treated experimentally. Almost every national economic statistic must be standardized. Here is a description of one: The US Department of Commerce has developed a new analytical tool to measure economic growth in regions, states, and countries. Its study shows that the "normal'* change in the number of workers in an area would have been if its economic conditions had been parallel to the nation's between 1940 and 1960. The difference between actual employment change and "normal" employment change measures the amount by which an area rate exceeded or lagged behind the national rate. This measure is then cross-classified according to the type of industry in the area, and the rates of growth for its industries are compared with the national rates to obtain a measure of relative growth potential. As an example of employment shift between 1950 and 1960, jobs in Scranton, Pennsylvania, would have increased by over 9,700 if that city had grown at the "normal" rate. Actually Scranton had a decline of over 8,400 jobs. I n effect, this was a net loss of over 18,300 jobs. The study found that the two fastest-growing regions were the Ft. Lauderdale, Florida area, and Anaheim-Santa Ana-Garden Grove, California. These had job gains between 1950 and 1960 of 249 percent and 201 percent, respectively, of their total 1950 employment. The three areas which lost the most jobs were those of Jersey City, New Jersey; Erie, Pennsyl­ vania; and St. Joseph, Missouri, which experienced declines during the decade of over 24 percent each. (Illinois Business Review, April 1966, p. 9)

258

Research Decisions and Procedures

Another common standardizing device for demographic and economic national statistics is the base year, i n w h i c h each year's ( o r month's) amount is expressed as a percentage of the first or other base year. The deflated dollar is a very useful standardizing device i n economic statistics; its pur­ pose is to standardize the values of amounts of money earned or p a i d i n years w h e n the values of the dollar were different. Here is an example of h o w lack of standardization can t u r n the w o r l d topsy-turvy: Many an American youth believed that his life was to be cut off prematurely when the government called him to the colors in 1917 to fight for his country. The fact was that he was really in less danger while fighting for the land that bore him than he was while engaged in his peaceful vocations. Fifty-three thousand-three hundred American soldiers were killed or died of wounds during the nineteen months of our participation in the War, victims of every refinement of modern slaughter; yet during that same period 132,000 persons were killed at home in the performance of the tasks of peace. (New York Herald Tribune, January 2, 1927, i n Cohen, p. 666) The proportion or ratio is a useful device for standardizing data. Its sole purpose is to indicate the relative sizes of t w o quantities to be compared. The percentage is a special k i n d of p r o p o r t i o n ; i t is a standardization device, for t w o percentages can immediately be compared to see w h i c h is bigger, whereas 22/73 and 18/67 cannot be compared u n t i l they themselves have been standardized. A l l of us have used percentages all our lives, and i t is h a r d to imagine h o w we w o u l d get along w i t h o u t this device. But the percentage is deceptively simple, and i t takes a lot of hard t h i n k i n g to decide w h i c h percentage to compute. T a k i n g out trends is often a necessary standardizing device. EarHer we saw that, i f the relationship between auto sales and auto prices is plotted for several decades, i t appears as i f the t w o variables go together, leading to the inference that higher prices cause higher sales. Actually, i t is the trends of higher prices and higher auto sales over the years that confuse the picture. One w a y to clarify the matter is to consider the difference from year to year i n sales and prices, w h i c h serves to remove some of the effects of trends ( o f some k i n d s ) .

8. I n d e x ^ N u m b e r s A n index number is a device for adding apples and oranges. Everyone knows that y o u cannot add apples and oranges, b u t the scientist does i t 2. There is confusion in the use of the terms "index" and "index number" in the social sciences. Often any proxy (surrogate) is said to be an "index" for the conceptual variable it stands for. But "index number"—often spoken of simply as "index," also—is a special kind of proxy that is a complex of many separate proxies. The distinction will become clear as the section proceeds.

Data Handling, Adjusting, and Summarizing

25§

anyway. H e simply adds the unaddable.^ The example is not hypothetical. A newspaper headline said "1964 Crops Near Record in Illinois." But how can one decide easily whether the crop in one year is better than the crop the year before? Perhaps the wheat crop was better but the corn crop worse, the sorghum crop a little worse, the apple crop better, and so on. O n what basis do you decide whether the crop as a whole was better or worse? From the point of view of farmers as a whole, the simple cash value of the crops is a good index. But from a wider point of view this index is insuflBcient, for prices go up when crops are poor. Therefore an index is needed that takes into account the amount of each type of crop and weights each of them. And, indeed, the U.S. Department of Agriculture has just such an index, which allows it to say that the "all crops production index as of September 1, 1976 is at 118 percent of the 1967 index used as a standard, down from last years 122 percent." ( A P , September 11, 1976, from C - U News Gazette, p. 1) How do the judges decide who wins a boxing match? Unless one fighter knocks out the other fighter, the judges must take into account several different factors: aggressiveness, number of blows struck, and strength of blows struck, among others. A n d there must be some system to weight the different factors; for instance, number of blows struck is more important than is aggressiveness. The system of weights, and the overall result, is an index number. The index number for boxing differs from country to country, and who wins may depend upon the particular system in use. Wages are another example. How much did wages go up last year? Jani­ tors' wages went up 9 percent, but professors' wages went up 7 percent. E v e n if the whole world were composed of only janitors and professors, it would not make sense simply to split the difference, because there are more janitors than there are professors. The stock-market indexes illustrate that conclusions may depend upon how the index is constructed. O n a particular day the Dow-Jones Index may go down while the Standard & Poor Index is going up. Dealing with the index-number problem reaches a peak of frustration when you must decide whether a group of people taken as a whole is better or worse off following some. social change. A few people may suffer "greatly," and a lot of people may benefit "slightly." But we have no Benthamic "felicific calculus" with which to figure out whether the total of human happiness has increased or decreased. T h e first principle in making index numbers is to choose items for the index that are representative of the universe whose movement you want to 3. By now I have said that scientists measure the unmeasurable, separate the inseparable, and add the unaddable. Scientists also do many other things that are, in someone's theory, impossible, just as we all do these things in our daily life. It is true that all these opera­ tions cannot be done scientifically. But they can and must be done by scientists in order to get on with the business of obtaining information and knowledge about the world in which we live.

260

Research Decisions and

Procedures

measure w i t h the index. I f y o u are constructing an index of consumer prices, you cannot measure the price changes i n all items consumers buy; so you take a sample. You m i g h t include bread, washing machines, b a l l p o i n t pens, men s underwear, and so on. The second principle is to apply weights to the various items according to the items' importance i n the universe and also to h o w w e l l each particular i t e m reflects the movement of all items. W e take published indexes for granted. W e forget not only h o w arbitrary and chancy they are b u t also h o w difficult and time-consuming they are to create. A l l of us t h i n k we k n o w w h i c h supermarket is the most expensive i n t o w n and w h i c h is the least expensive. B u t the price-ranking of a super­ market depends u p o n thousands of different articles. I t took a researcher 300 skilled man-hours to figure out the price r a n k i n g of just eight super­ markets i n a shopping area ( H o l d r e n , p. 6 8 ) . Usually w e overcome the diflBculty of c o m p u t i n g indexes b y d o d g i n g the full complexity of the problem. I n the study of Hquor prices i n various states, I ended u p using the prices of just eight brands i n the index, weight­ i n g all of them equally—mostly because data from only these eight brands were available for most states. Indexes are made the w a y all of us i m p l i c i t l y make daily decisions. Con­ sider, for example, the situation of a fellow like A r r o w s m i t h , w h o has decided to marry one of t w o girls b u t has not yet decided w h i c h one. Sheila is more beautiful and cooks better, b u t V i c t o r i a is more intelligent and dances better. H o w is a man to decide? H e m i g h t make up a scoreboard like this: Sheila

Victoria

Beauty Cooking Intelligence Dancing

10 2 3 1

5 0 10 2

Wife index

"16

17

Notice that the fellow is g i v i n g more w e i g h t to beauty and intelhgence than to cooking and dancing. I m p l i c i t l y he is saying that beauty and intel­ ligence are more i m p o r t a n t than cooking and dancing. The sum of each girl's scores i n the four "unaddable" dimensions is her wife index. Another complication is added i f the fellow does not know exactly h o w each g i r l stacks u p on one or more dimensions. Let's say he knows that Sheila is w e l l off and her money is i m p o r t a n t to h i m , so he weights i t twice as heavily as beauty ( a n d intelhgence). H e thinks Victoria may have money, and, i f she does, she has t w i c e as m u c h as Sheila has,^ he figures the 4. I am assuming that twice as much money is twice as desirable, which it usually is not (Galanter, pp. 54-55).

Data Handling, Adjusting, and Summarizing

261

odds are 25 to 75 that Victoria does have the money. The score card is then as follows: Sheila Odds

Points Beauty Cooking Intelligence Dancing Money Wife index

10 2 3 1 20

X X X X X

1 1 1 1 1

Victoria Weigtited Sum = = = = =:

10 2 3 1 20 36

Points 5 0 10 2 40

Odds 1 1 X 1 X 1 X .25 X

X

Weighted Sum = = = = =

5 0 10 2 10 27

This k i n d of scheme is a formal exphcit representation of much of our everyday decision-making process; we w o u l d frequently make better deci­ sions i f we made explicit scorecards. This is the nature of indexes and the w e i g h t i n g of the components of indexes w i t h "subjective p r o b a b i l i t y " esti­ mates of the odds.

9 . A v o i d i n g the H a z a r d s of H i r e d H e l p Other people are not as smart as y o u are. E v e n worse, they have only the faintest g l i m m e r i n g of w h a t y o u are t r y i n g to do i n your research. A n d worst of all—though i t is perfectly understandable, of course—is that other people do not care about your research nearly as m u c h as you do: the performance you get from people depends on their motivations. These considerations suggest that y o u should t r y as h a r d as possible to ( a ) involve your assistants i n the w o r k , and give them a personal stake i n the output; and ( b ) w a t c h over them w i t h as m u c h vigilance as possible, and tell them that y o u w i l l check on their work, as, for example, telling interviewers that there w i l l be follow-ups on a randomly selected p o r t i o n of the interviews. Furthermore, be supercareful about w h i c h tasks and respon­ sibility y o u delegate to assistants. O f course you are not ready to employ assistants yet, b u t soon y o u may be. E v e n more important, this section may lead y o u to see the potential hazards i n research from a different and sober­ ing p o i n t of view, b o t h as a producer and consumer of research. The dilemma is clear: The p r i n c i p a l investigator cannot make a l l the hundreds, thousands, even tens of thousands of nonroutine decisions about data collection i n a major study; b u t delegation is dangerous. A n d just because i t is so i r r i t a t i n g , time-consuming, and demanding, y o u are tempted to leave the decisions to your assistants. B u t you must not y i e l d to this temptation. You must ride h e r d on the data-collection process at all points, and y o u must make all extraroutine decisions yourself u n t i l you have trained an assistant and checked his ability. E v e n thereafter you must check periodically to ensure that he has not strayed. I n this periodic checking look

262

Research Decisions and Procedures

for big blunders. Check the b i g items. See i f everything looks approximatehj right. I f i t does not, find out w h y . The only possible solution is a managerial solution: F i n d capable assis­ tants, t r a i n t h e m w e l l , and check on them constantly. T o be afraid to dele­ gate is to doom yourself to research projects so small that you can do them w i t h your bare hands and to a scientific output a fraction of w h a t i t m i g h t otherwise be. B u t to delegate unwisely is to waste all the w o r k that y o u do. As a first step i n the t r a i n i n g of any assistant, have h i m read, sign, nota­ rize, and swear on his great grandmother s grave that he w i l l follow the instructions listed here. Even f o l l o w i n g instructions w i l l not guarantee that errors w i l l not occur. B u t the instructions w i l l give y o u justification for great righteous anger and outrage w h e n you b a w l out your assistant for noncom­ pliance. A n d that may have some effect. INSTRUCTIONS TO R E S E A R C H ASSISTANTS

( AND TO YOURSELF )

1. W h e n d o i n g a new task, carry out only a small part of the w o r k and then check w i t h your supervisor that the method is correct. 2. Keep a precise and detailed record of exactly h o w you handle each nonroutine decision. 3. Handle no exceptions w i t h o u t consulting your supervisor. 4. Figure the number of significant digits correctly; i f i n doubt, ask the supervisor. 5. Perform all computations on paper, not i n your head. 6. T h r o w away nothing—not w o r k sheets, not computations, not unusable data—nothing. 7. L a b e l everything—all data sheets, all w o r k sheets, everything. 8. Indicate the source of each piece of data and computation, whether from published sources or from your o w n original data. J. Roth described several instances i n w h i c h he and other graduate stu­ dents cut corners w h e n w o r k i n g on research projects. The lapses increased w i t h time on the project, as the workers came to t h i n k that they could cheat w i t h o u t affecting the study's results. When a researcher hires others to do the collecting and processing tasks of his research plan, we often assume that these assistants fit the "dedicated scientist" ideal and w i l l lend their efforts to the successful conduct of the over-all study by carrying out their assigned tasks to the best of their ability. As suggested by my examples, I doubt that hired assistants usually behave this way when they are junior grade scholars themselves. I t becomes more doubtful yet when they are even further removed from scholarly tradition and from the direct control of the research directors (e.g., part-time suiwey interviewers). I t seems to me that we can develop a more accurate expectation of the contribu­ tion of the hired research worker who is required to work according to somebody

Data Handling, Adjusting, and Summarizing

263

else's plan by applying another model which has been worked out in some detail by sociologists—namely, the work behavior of the hired hand in a production organization. . . . [W]orkers made the job easier by loafing when the piece rate did not pay well. They were careful not to go over their informal "quotas" on piece rate jobs because the rate would be cut and their work would be harder. They faked time sheets so that their actual productive abilities would not be known to manage­ ment. They cut corners on prescribed job procedures to make the work easier and/or more lucrative even though this sometimes meant that numerous prod­ ucts had to be scrapped. (Roth, pp. 191-192) Between y o u and me—but don't tell your assistant or mine—the errors she does make w i l l sometimes not be very dangerous i f they are made randomly, i n w h i c h case they tend to cancel out. This is likely to be true of purely arithmetical errors. For example, i n a study of the effects of l i q u o r prices on liquor sales i n various states, a clerk made errors, many of t h e m serious, i n the calculation of every single one of the state price changes, b u t correct recalculation arrived at almost the same median estimate as d i d the calculation that contained the errors. O n the other hand, even random mea­ surement errors reduce the apparent association between t w o variables. I f there is any systematic bias, however, errors of any k i n d may damage your study. For example, i f interviewers systematically overestimate the values of the homes of whites and underestimate those of blacks, the data on value of homes w i l l be worse than useless for many. E v e n purely arith­ metical errors can be affected by bias; the I n t e r n a l Revenue Service finds that a large p r o p o r t i o n of mistakes i n the arithmetic of tax returns are i n favor of the taxpayer rather than of the government. A n d errors i n b i r t h records show girls as boys more often than the reverse, and g i r l babies are more often left out entirely ( W a l h s & Roberts, p. 9 5 ) . W h e t h e r the error is damaging depends u p o n the purpose of your study.

10.

Summary

This chapter discusses the process of collecting and processing data. These are administrative tasks, calling on the skills of personnel supervision, atten­ tion to everyday detail, constant checking for mechanical and h u m a n error, and deciding w h a t to do about the innumerable out-of-the-ordinary situa­ tions that arise. Adjusting the data to allow for different origins and reliabilities requires not only w i s d o m b u t honesty w i t h yourself to avoid the natural h u m a n tendency to jiggle the data so that they show w h a t y o u expect or w a n t them to show. M a k i n g allowances for missing data and allocating portions of aggregated observations require the same honest self-discipline. Standardizing the data is crucial to a reasonable comparison of groups. The trick is to find a basis of comparison that controls for the m a i n differ-

264

Research Decisions

and

Procedures

ences among the groups that are not themselves the subject of your interest. Indexes are often useful standardizing devices.

EXERCISES 1. Is t h e r e a s i n g l e m o s t a p p r o p r i a t e i n d e x f o r c o m p a r i n g t h e r e l a t i v e s a f e t y of a i r p l a n e a n d a u t o t r a v e l ? G i v e o n e o r m o r e i n d e x e s a l o n g w i t h t h e p u r p o s e ( s ) f o r w h i c h it is m o s t a p p r o p r i a t e . 2. E s t i m a t e t h e n u m b e r of s i n g l e m e n a g e d s e v e n t e e n t o t w e n t y - f i v e w h o h a d n o d a t e s in y o u r t o w n o r c i t y last S a t u r d a y n i g h t . 3. E s t i m a t e t h e t o t a l m a r k e t v a l u e of all t h e c l o t h e s b e i n g w o r n at t h i s m o ­ m e n t b y p e o p l e in y o u r t o w n o r c i t y . 4 . A p r o f e s s o r u s e s o n e b e d r o o m of h e r h o u s e f o r a s t u d y , in w h i c h s h e w r i t e s books. What cost should she impute for the space w h e n she deducts her expenses for income tax? 5. W h a t s t a n d a r d i z a t i o n d e v i c e s a r e u s e d in m e a s u r i n g I.Q.? W h a t s t a n d a r d i ­ z a t i o n d e v i c e s a r e r e a s o n a b l e w h e n m e a s u r i n g I.Q.s of b o y s a n d g i r l s w h o g r o w u p in " c u l t u r a l l y d i s a d v a n t a g e d " h o m e s ? 6. E v a l u a t e t h e c o s t of e a t i n g at f o u r c a m p u s e a t i n g p l a c e s , i n c l u d i n g a college cafeteria. This evaluation requires creating a food index. Then ask a s a m p l e of s t u d e n t s t o r a t e t h e e a t i n g p l a c e s in o r d e r of c o s t , a n d c o m ­ pare what the students think to what your index shows. 7. E s t i m a t e t h e n u m b e r of s t u d e n t s w h o t a k e t h e b a s i c c o u r s e s in p s y c h o l o g y a n d p o l i t i c a l s c i e n c e in t h e U n i t e d S t a t e s e a c h y e a r . I n c l u d e n i g h t s c h o o l s and irregular students. State how you built up your estimates. 8. M a k e some r e a s o n a b l e quantitative e s t i m a t e of t h e c o m p a r a t i v e v a l u e t o y o u of a n y t w o c o u r s e s y o u m i g h t t a k e n e x t s e m e s t e r .

ADDITIONAL

READING

FOR

CHAPTER

17

M i t c h e l l ( C h a p t e r 3) p r e s e n t s a n e x c e l l e n t d i s c u s s i o n of d a t a s t a n d a r d i z a t i o n a n d r e l a t e d t o p i c s in t h e c o n t e x t of e c o n o m i c d a t a . S e e T i n t n e r f o r a t e c h n i c a l d i s c u s s i o n of i n d e x n u m b e r s . O n d a t a p r o c e s s i n g , s e e S e l l t i z et ah ( C h a p t e r 13). S c h l a i f e r (1959), p p . 2 - 1 3 a n d C h a p t e r 2, p r o v i d e s a v e r y g o o d d i s c u s s i o n of h o w t o c o m b i n e p r o b a b i l i t i e s a n d v a l u e s t o a r r i v e at r a t i o n a l d e c i s i o n s . G o o d m a n (1955), p p . 2 7 7 - 2 8 1 , d e m o n s t r a t e s t h i s a p p r o a c h in t h e c o n t e x t of social policy decisions. T h e C e n s u s B u r e a u d a t a e r r o r , d e s c r i b e d as t h e " C a s e of t h e I n d i a n s a n d t h e T e e n a g e W i d o w s , " is w o r t h t h e r e a d i n g , b y C o a l e a n d S t e p h e n . O n i n d e x numbers, I r e c o m m e n d Suits ( C h a p t e r s ) .

part three t h e o b s t a c l e s to social-science linowledge and w a y s to overcome tiiem

IB t h e c o n c e p t of obstacles in t h e s e a r c h for empical knowledge 1. Summary

After the researcher has specified the question that she wants the research to answer, all she has to do is to go out and get the facts. But knowledge can be tricky to obtain, and common-sense knowledge-gathering methods may not be suflSciently powerful. There are often many obstacles that prevent you from getting accurate knowledge easily. W h e n I say there are obstacles, I mean that the world is big, complex, numerous, expensive, and tiring to try to understand, not least because human nature is so complex. The resulting complications, as well as the large numbers and vastness of nature that demand time and money, are indeed obstacles to your finding out what you want to find out. To put it another way, the flaws that occur in other people's research, and for which you criticize them, result from the obstacles that the researchers did not succeed in overcoming. T h e concept of obstacles to knowledge is perhaps the most important single concept in this book. In the several years that I have been teaching research design to first-year grad­ uate students no single problem has caused me more persistent anguish than that of trying to organize research errors into a meaningful pattern. I would like (as I think anyone would like) to have a scheme of presentation which is logi­ cal and at the same time exhaustive of these research errors. I t might be useful too to have a checklist of known errors so that in designing an experiment we could go through and see if we have successfully avoided such errors. ( U n ­ derwood, p. 89)

The Concept

of Obstacles

267

The aim of Part Three of this book is to provide such a presentation and classification of obstacles to knowledge and of research errors, as w e l l as to offer methods of overcoming these obstacles. The purpose of this chapter is to introduce the general concept of obstacles to knowledge and to lay the groundwork for the discussions of particular types of obstacles i n the chap­ ters that follow. Examples of obstacles to knowledge are easy to come by. Major obstacles usually exist even for research problems that seem straightforward, as these examples show: H o w many people i n the U n i t e d States are over six feet tall? You could measure every one of the more than 190 m i l l i o n Americans, b u t the cost of d o i n g so is a major obstacle, an obstacle that y o u can surmount by sampling. Next, h o w hot is the surface of the moon? T o u c h your hand to the moon, or h o l d a thermometer to it. U n t i l y o u can do so i t is necessary to find a m e t h o d of measuring temperature at long distance. T h i r d , w h a t does the K r e m l i n plan to do about the M i d d l e East? You could ask the Russian Premier, of course. B u t there is at least a t i n y chance that he w i l l not answer your question or w i l l not answer i t t r u t h f u l l y . N o one has yet developed a satisfactory scientific method to overcome this obstacle. F o u r t h , h o w do Americans feel about Thanksgiving? You can ask one or a dozen Americans. But y o u must find a method to ensure that the people you ask are typical. Fifth, and perhaps most i m p o r t a n t of all, w h o w i l l w i n tomorrow's game between the W h i t e Sox and the Tigers? You can easily find out w h o w o n yesterday's game. B u t yesterday does not repeat itself perfectly; there is variability i n day-to-day scoring. You m i g h t look to see w h o w o n most of the games between them last year. B u t team personnel changes from year to year. T h e obstacles are indeed great. Sometimes i t seems as i f nature—and particularly the h u m a n aspect of nature that is the realm of social science—diabohcally throws sand i n the researcher's eyes and leads h i m to w r o n g conclusions. N o t so. "God is subde but not malicious," Einstein is reported to have said. I f nature d i d indeed try to fool and foil you, your task w o u l d be a lot more difficult even than i t is.^ Subjects who lie purposely to deceive you and communities of people who deceive you purposely are unusual exceptions to this generalization. M y aim i n emphasizing the obstacles to knowledge is not to discourage you or make you feel that the game of research is very h a r d to play w e l l . Rather, I w a n t to convey that overcoming the obstacles is the very essence of the game of research. I f there were no difificult obstacles to knowledge, there w o u l d be no need for research or for skilled researchers. The obstacles are the challenge, the spice that makes the game fascinating. H . Ebbinghaus began his study of the learning process b y setting forth the t w o major obstacles that prevent easy application of natural-science method: 1. Those who still believe that genetic factors might be responsible for diseases caused by smoking and for smoking behavior implicitly believe in a nature that goes to remark­ able ends to confuse scientists (K. Brownlee; J. Simon, 1966c).

268

The Obstacles to Social-Science

Knowledge

In the first place, how are we to keep even approximately constant the be­ wildering mass of causal conditions which, in so far as they are of mental nature, almost completely elude our control, and which, moreover, are subject to endless and incessant change? I n the second place, by what possible means are we to measure numerically the mental processes which flit by so quickly and which on introspection are so hard to analyse? (Ebbinghaus, pp. 7-8) The most infamous example of an unsurmounted obstacle was the Literary Digest presidential p o l l of 1936. The p o l l failed disastrously be­ cause i t d i d not overcome the obstacle of bias i n its sampling procedure; i t simply d i d not succeed i n getting a representative sample of the voters, as we shall see later. Whenever t w o or more studies of the same question arrive at different answers, one or b o t h of the researchers has failed to surmount some impor­ tant obstacles. O n page 205 there is an example of various researchers w h o set out to determine whether t r a d i n g stamps raise or lower the consumer price of grocery products. They reach very different conclusions because i m p o r t a n t obstacles to knowledge are not overcome i n their studies. Unsurmounted obstacles sometimes go undetected for a long time i n pure research. B u t i n commercial research, competition among various firms can be highly effective i n revealing errors caused b y unsurmounted obstacles. Consider this story from the trade press: STUDIES YIELD CONTRASTING DATA F O R MAGAZINES

Some SRDS Totals Are Markedly Higher Than Politz's or Simmons' New York, Aug. 10—SRDS-Data Inc. has released new total audience figures for 27 magazines included in its Consumer/Audience Profile, thereby providing a comparison with figures previously reported by Alfred PoHtz and W . R. Sim­ mons & Associates. I n some cases the Data Inc. numbers jibe with those from the other two re­ searchers. But for some publications. Data Inc. audience totals range up to 4 1 % higher than Simmons' and 14% greater than Politz's. Simmons, for example, projected a total audience (of persons age 18 or older) of 10,957,000 for Wonuins Day, whereas Data Inc. reported 15,543,000, or 4 1 % more and Pölitz found 13,310,000, or 22% more than did the Simmons organization. . . . Several factors may account for the different audience totals found by the three researchers: (1) different probability samples; (2) different methodologies in conducting interviews and projecting the figures nationally; (3) not all re­ searchers were in the field at the same time—thereby possibly introducing an audience "seasonality" factor. {Advertising Age, August 16, 1965, p. 66) One must learn h o w to recognize i n advance the various obstacles and k n o w h o w to surmount each of them eflSciently. I n this t h i r d part of the

The Concept of Obstacles

269

book, we shall discuss the various obstacles one b y one, and i n conjunction w i t h each obstacle we shall discuss research tactics to overcome i t . Of course, y o u cannot spot all obstacles i n advance. Unexpected obstacles always crop u p as y o u get deeper into your research work, and then you must retreat a b i t and modify your method. The entire research process is a cycle of obstacles, methods, new obstacles, modified methods, and so on. Government intelligence w o r k generally, and spying i n particular, re­ sembles research i n being a matter of first identifying the obstacles and then finding the appropriate methods to overcome them. "Clandestine intelli­ gence collection is chiefly a matter of circumventing obstacles i n order to reach an objective" ( D u l l e s , p. 58) Some types of knowledge are harder to obtain than are others. I t is probably true that the wider the scope of the knowledge, the more diflBcult i t is to obtain. I t is easy for y o u to state w h a t magazine you read yesterday, although your memory may be faulty. I t is a b i t harder for your spouse to determine exactly w h a t magazine y o u read yesterday, and i t is decidedly more diflBcult for a stranger to find out w h a t y o u read. I t is hardest of all for a stranger to find out w h a t magazines each and every American or the "typical A m e r i c a n " read yesterday. A newspaper reporter usually has no trouble finding out such facts about a riot as w h a t time i t started, h o w m u c h damage i t caused, and h o w many policemen were on the scene. She has m u c h more trouble i n determining "pubhc opinion" about the riot. The diflBculties i n finding out w h o started the riot are so great that a reporter w h o does so successfully may earn a Puhtzer Prize. A n d the u n d e r l y i n g causes of the riot pose a question that bristles w i t h obstacles. After the exhaustive McCone report and innumer­ able other studies the causes of the 1965 Watts riots are still not clearly known. This seems an appropriate moment to r e m i n d y o u that no empirical knowledge w i l l ever be absolutely perfect and certain—not even your o w n name. Perhaps there was a case of mistaken i d e n t i t y at b i r t h . Some obstacles are more likely to appear i n some disciphnes than i n others. But, as w i t h the temperate-zone doctor w h o occasionally spots a case of tropical disease, a w i d e acquaintance even w i t h the obstacles that are infrequently f o u n d i n your area may make the diflFerence between your being a routinely skilled or a really gifted diagnostician of research ob­ stacles. Skill i n recognizing and overcoming research diflBculties is a matter of special importance for the statistically trained researcher. (Nonstatisticians may skip the f o l l o w i n g sentences.) Each assumption that underlies a statis­ tical test is really an assumption that certain obstacles are not present i n suflBcient degree to invalidate the test or estimating technique. For example, 2. G. Stigler tells us that "business is the collection of devices for circumventing barriers to profits" (Stigler, 1952, p. 435). The business of science is knowledge, and the business of business is profit.

270

The Obstacles to Social-Science

Knowledge

interrelationship among the independent variables is, as we shall see, a common and tricky obstacle, b u t the assumption that no interrelationship exists is necessary for the use of many statistical techniques. I f the statisti­ cian rushes ahead and applies a statistical technique w h e n the assumptions do not h o l d or w h e n he has not found some way of rendering the obstacle harmless, his statistical conclusions may fall into serious error. A n d indeed, not only are statistical techniques often misapplied i n this way, b u t the way the results are stated often obscures the fact that basic assumptions are not valid. Such reports can easily mislead the u n w a r y or less sophisticated reader. Remembering that there are no standard solutions for obstacles can re­ duce your frustration and anxiety i n designing research. I t just is not possi­ ble to compile a handbook of research solutions i n the style of medical reference books, w h i c h indicate one or more specific treatments for each particular disease that y o u diagnose. Furthennore, as i n medicine, there are i n research some afflictions that are incurable. For example, the eminent reviewing committee noted that there simply do not seem to be satisfactory methods for overcoming some of the obstacles that A . Kinsey, et al, faced, especially the obstacle of nonresponse ( Cochran, et al ) . One w a y that Kinsey and his colleagues t r i e d to surmount the nonre­ sponse obstacle ( a n d others as w e l l ) was to compare the results of their study to those of other studies. This is the most general m e t h o d for sur­ m o u n t i n g obstacles. B u t the other studies also d i d not surmount this ob­ stacle and therefore citing them as supporting evidence could only reinforce any error i n the findings from this source. Some obstacles are particularly prevalent i n certain social sciences. But, to a surprising degree, most obstacles are found i n all the social sciences. Consider economics. Most economists seldom come face to face w i t h the obstacle that people of w h o m they ask questions may lie to them or rational­ ize their answers. B u t that is because most economists w o r k w i t h data gathered b y other people—and at some point i n the data-gathering process someone faced these obstacles. For example, a labor economist m i g h t work w i t h U.S. Bureau of L a b o r Statistics unemployment data and take the data at face value, w i t h o u t t a k i n g into account that the raw data came from a questionnaire survey of a sample of families, some of w h o m m i g h t have lied, rationahzed, or done other h u m a n things that present obstacles to getting knowledge. Similarly, almost all the price data that economists w o r k w i t h are or were collected b y surveys and question asking and are subject to all the obstacles inherent i n that process. T h e best general approach to overcoming obstacles to knowledge is to employ t w o or more very different research methods. No single method can overcome a l l the important obstacles. B u t different sorts of methods over­ come different sorts of obstacles, and together their strengths can cover each other's weaknesses, p r o v i d e d that the results agree. Emphasis on the joint use of t w o or more empirical methods, together w i t h theoretical speculation, is a basic theme of this book, a strategy called "triangulation" b y D e n z i n .

The Concept of Obstacles

271

Please do not allow vour awareness of the obstacles to knowledge faced by empirical research to focus you only on the flaws of the research of others, or to conclude that all data and research results are worthless be­ cause none can be perfect. M i n o r flaws need not invalidate research. A n d i f the researcher has a good basic idea and a basically sound research ap­ proach, even a good many m i n o r flaw^s and a small scale of research w i l l not invalidate the research results. A l t h o u g h the term "obstacle" does not sound attractive, I think we should not complain that knowledge is not perfectly easy to get, just as we should not lament the existence of the phenomenon of friction on earth. To repeat, the obstacles to knowledge are w h a t make the achievement of knowledge interesting. Also, they provide livelihoods for social scientists w h o sweat for knowledge i n universities, government agencies, and business; i t is because there are obstacles to knowledge that people w i l l pay to obtain i t . A n d now, on to the obstacles themselves.

1. S u m m a r y Some factual knowledge about the w o r l d comes easily. B u t the task of empirical social science is to produce the knowledge that cannot be obtained simply and easily by casual observation. There is a w i d e variety of obstacles that prevent easy acquisition of reliable knowledge. Sometimes these obstacles keep people from attempting to gather the i n f o m i a t i o n ; sometimes the obstacles cause information-gather­ i n g to y i e l d w r o n g conclusions. One of the m a i n tasks of the skilled researcher is to construct the research design i n such manner as to overcome these obstacles successfully and efficiently. The battle against nature's complexities is h a r d and never ending. But this battle is the bread and butter of the empirical social scientist, and i t may be w o n w i t h diligence and ingenuitv.

EXERCISES 1. D i s c u s s h o w e a c h of t h e t h r e e f a c t o r s l i s t e d o n p a g e 2 6 8 m i g h t h a v e a c ­ c o u n t e d f o r t h e d i f f e r e n c e s in r e a d e r s h i p f i g u r e s f o r Woman's Day e s t i ­ m a t e d by Pölitz, S i m m o n s , a n d Data Inc. 2. W h a t a r e t h e o b s t a c l e s t o a r e p o r t e r ' s f i n d i n g o u t w h o s t a r t e d a r i o t ?

ADDITIONAL

READINC

FOR

C H A P T E R 18

O b s t a c l e s t o r e s e a r c h a r e u s u a l l y d i s c u s s e d in t h e c o n t e x t o f e r r o r s a n d f a l ­ lacies. See Mead, pp. 45-58, on anthropology; Saiger on medicine; Deming a n d H a n s e n ef ah ( C h a p t e r 2) o n s u r v e y s ; M o r g e n s t e r n (1953), p p . 1 3 - 7 0 , on economics; Campbell and Stanley on psychological experiments; Can-

272

The Obstacles to Social-Science

Knowledge

n o n ( C h a p t e r 11) o n b i o l o g y . W a l l i s a n d R o b e r t s ( C h a p t e r 4), C o h e n , a n d C o h e n and Nagel, pp. 316-323, are general references that discuss errors a n d f a l l a c i e s in s o c i a l - s c i e n c e r e s e a r c h . M o r g e n s t e r n ' s On the Accuracy of Economic Observation is a f r i g h t e n i n g c o m p e n d i u m of i n f o r m a t i o n o n t h e s t a t e of s u c h a f f a i r s in e c o n o m i c s .

19 by t h e humcinnEiss o f t h e observer: appendix on interviewing 1. 2. 3. 4. 5. 6. 7.

Observer Variability Observer Bias Cheating by Interviewers Variability Among Observers Observer-Caused Effects Summary Appendix: Personal Interviewing and Interviews

This is the first of the group of chapters cataloguing the obstacles to k n o w l ­ edge and the tactics by w h i c h the obstacles may be surmounted. The ob­ stacles described i n this chapter all arise because the observer is a h u m a n being rather than a machine. Indeed, i f a w a y could be found to replace the observer w i t h a machine, these obstacles w o u l d often be surmounted; that is w h y data-gathering instruments are used. To p u t i t another way, obstacles that must be overcome arise because research workers are as complicated as their subjects. This chapter and f o l l o w i n g ones offer examples of each different obstacle and of methods for overcoming i t . You w i l l best understand the nature of the obstacles, how­ ever, i f y o u create your o w n examples from your w o r k and reading.

1. O b s e r v e r V a r i a b i l i t y This obstacle arises from the observer's most h u m a n quality—imperfect physical and mental faculties. I am not n o w talking of bias, w h i c h is a systematic tendency to deviate from the "true" value i n one direction. Rather, I mean the i n a b i l i t y of a given observer to repeat an observation again and again i n exactly the same way w i t h exactly the same result. Observer variability occurs even w h e n i t w o u l d seem easy to be objective.

274

The Obstacles to Social-Science

Knowledge

For example, i n library-science research i t is sometimes necessary to decide whether t w o similar works by an author are t w o different works or t w o editions of the same work. E v e n an experienced observer w i l l decide differ­ ently about the same pair of books from one day to the next. I n other words, there w i l l be variability (dispersion) i n the observer's judgment from time to time. V a r i a b i l i t y of traffic policemen i n g i v i n g tickets for speeding is part of our folklore. The state of the officer s digestion and the degree of amicability between h i m and his wife, as w e l l as the sex of the driver, are believed to influence the outcome. I n the physical sciences too—especially i n the early stages of a physical discipline—observer variability is an important obstacle. For example, a h u m a n laboratory assistant does not read a thermometer perfectly. One important reason is that on repeated trials the observer looks at the ther­ mometer from slightly different angles, giving slightly different readings. The difficulty is compounded i f the observer cannot get close to the ther­ mometer, as, for example, w h e n he reads i t from inside a w i n d o w . M a n y other m i n o r influences also cause variability and prevent precision. As the natural sciences progressed and as more accurate instrument read­ ings were needed, ingenious researchers developed such ways of sumiounting variability i n instrument reading as the f o l l o w i n g : 1. Repeat the observation, and take an average of the observed values. I f the observer-variability errors are unrelated to (independent of) one another, the average of the observations is likely to be closer to the actual value than is a single observation. 2. Reduce the variation i n the v i e w i n g angle w i t h a mechanical device that holds the chin of the observer i n a fixed place and thus fixes the location of his eye. Placing a m i r r o r behind the needle is another com­ mon device to increase precision i n reading instruments. 3. Read and record electronically, and p r i n t the reading automatically. But electronics has not completely licked this obstacle. V a r i a b i l i t y i n observing stars is a major difficulty i n astronomy, for example. The social scientist controls observer variability w i t h similar tactics. O b ­ servations are repeated and averaged whenever possible, to cancel out ran­ dom variation. A n d , analogous to the use of the chin bar i n reading ther­ mometers, the social scientist often reduces the scope of the observations. I f a field worker is to gather data on family income, she can be instructed to deteiTnine only how many cars are owned, hotv many rooms the house contains, and the occupation of the breadwinner. This list leaves less scope for judgment than w o u l d a general instruction to estimate the income level. ( N a r r o w i n g the scope of judgment may also be considered a step t o w a r d operationalizing the definitions the observer works w i t h . ) Reducing scope is a device to restrict the observer to gathering knowledge that he can obtain reliably. Earlier I said that i t is easier to state accurately w h a t magazine you yourself read yesterday than to find out w h a t a stranger

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 3/19/2020 12:37 PM via ST MARYS UNIV main.ehost

Obstacles Created by the Humanness of the Observer

275

read or w h a t the average person read. This decreasing accuracy parallels the continuum of confidence y o u can place i n an observer. You rely heavily on her statement of h o w she herself w i l l vote. You may place considerable credence i n her statement of h o w her husband w i l l vote. You do not have m u c h confidence i n her casual statement about how her neighbors w i l l vote. A n d y o u are politely skeptical about her p r e d i c t i o n of h o w her state w i l l go i n the next election. Here are t w o general principles about observer accuracy and the scope of observation: First, the sharper and more measurable the categories, the more accurate the judgments. A n observer w h o estimates the year of a respondent's car w i l l be more accurate about its age than i f he judges i t to be simply " o l d " or "new." A n d a judgment of " o l d " or "new" w i l l be more accurate i f he specifies a cut-off point, say three years old. O n the other hand, i t is a waste to ask for more categories of accuracy than he needs. Second, the less the observer must summarize, the more accurate he w i l l be. He w i l l be more accurate i n j u d g i n g the intelligence of a single person than i n j u d g i n g the average intelligence of a whole group of people. L e a r n i n g from experience usually reduces observer variability. Professors often take advantage of this process w h e n they grade exams b y reading several exam books and g i v i n g only tentative grades before beginning to grade i n earnest—and then regrading the exams that were read first. A. Kinsey required a full year of practice from his interviewers before he w o u l d accept their data. The Kinsey interviewers, however, needed such an extraordinarily long t r a i n i n g period because they had to exercise a great deal of judgment. Mechanical and electronic devices can occasionally be employed i n the social sciences to reduce observer variability. For example. College Board Examinations are graded and totaled mostly electronically. Of course, i n ­ struments cannot make observations and measurements that require judg­ ment. I n educational examinations, this l i m i t a t i o n sometimes means ignoring some kinds of complexities i n responses and not measuring some kinds of abilities. ( I n English composition, however. College Board Examinations are graded by judges, and several independent judges provide a con­ sensus. ) Tape recorders and cameras are frequently used i n anthropology to produce a pennanent record that can be reviewed repeatedly, either b y the observer himself or b y other people, i n order to check on and reduce the extent of v a r i a b i l i t y ( M e a d ) . A n d the experimental psychologist uses many gadgets to reduce observer variability—running wheels to measure animal activity, water and food meters to measure intake, and timers of all kinds to ensure exact t i m i n g of stimuli. 2. O b s e r v e r Bias The tall fellow records a lower temperature than the short fellow because the tall fellow looks d o w n at the thermometer. This innocent propensity to take a lower reading that inheres i n the tall fellow we shall call b y the ugly

276

The Obstacles to Social-Science

Knowledge

name "bias." I n scientific usage, bias is merely a tendency to observe the phenomenon i n a manner that differs from the "true" observation i n some consistent fashion. B u t usually there is no way to determine the "true" value, especially before the study is complete. Therefore, we assume that everij observer is biased i n one direction or the other. O u r job is to determine each observer's bias and to allow for i t . Most biases i n the social sciences are social. Social-scientific topics and situations stir up and involve the observer's beliefs, emotions, and other mental baggage. For example, sportswriters lament the fate of the noble boxer B a t t l i n g Siki, w h o underestimated the problem of observer bias; he fought M i k e M c T i g u e for the light-heavyweight championship i n D u b l i n on St. Patrick's D a y 1923 and lost b y a decision. ( N o k i d d i n g ! ) I n a survey of anti-Catholicism and anti-Semitism, a Catholic interviewer perceives responses somewhat differently than does a Jewish interviewer. A n d whether an interviewer's o w n values are Puritan or libertine may affect a sex survey. I n the "Priscilla's Pop" cartoon i n F i g u r e 19.1 the interviewer's feminist bias seems to be affecting her survey slightly. Observer bias creeps i n no matter h o w careful y o u are. M a n y scientists have had an experience like this one : I had an idea that professors w h o had taken their Ph.D. degrees from universities w i t h lesser reputations w o u l d be more productive than w o u l d colleagues serving on the same faculty w h o had taken their Ph.D. degrees from schools w i t h higher reputations. I there­ fore began to m a t c h up pairs of professors on given faculties w h o had taken their Ph.D. degrees from different schools i n the same year, i n t e n d i n g to compare p r o d u c t i v i t y . B u t I soon found that m y desire that m y idea be confirmed was causing me to match pairs that w o u l d show the results I w a n t e d to see, thereby invalidating the work. Anthropology has suffered worst and longest from observer bias. M o d e m anthropologists regard most early anthropological accounts b y sea captains and missionaries as almost useless, simplv because the authors' perceptions were so w a r p e d by their o w n cultural background. Their bias was often that all non-Europeans were heathen, savage, p r i m i t i v e , and w i t h o u t l a w or social organization. Even so astute a social scientist as T . R. Malthus fell afoul of this problem : The prelude to love in this country [New South Wales] is violence, and of the most brutal nature. The savage selects his intended wife from the women of a different tribe, generally one at enmity with his own. He steals upon her in the absence of her protectors, and having first stupefied her with blows of a club, or wooden sword, on the head, back, and shoulders, every one of which is fol­ lowed by a stream of blood, he drags her through the woods by one arm, regardless of the stones and broken pieces of trees that may lie i n his route, and anxious only to convey his prize in safety to his own party. The woman thus treated becomes his wife, is incorporated into the tribe to which he belongs, and but seldom quits him for another. The outrage is not resented by the rela-

Obstacles Created by the Humanness of the Observer FIGURE

277

19.1

Source: Priscilla's Pop, November 6, 1966, by' A l Vermeer: © 1966 by NEA, Inc.; reprinted by permission of Newspaper Enterprise Associatiion ( or NEA ).

278

The Obstacles to Social-Science

Knowledge

tions of tlie female, who only retahate by a similar outrage when it is in their power. (Irwin, ed., pp. 15-16) Overcoming the obstacle of observer bias i n physical problems like ther­ mometer reading is reasonably easy. The simplest tactic is to obtain read­ ings from several observers and to calculate their mean value, on the as­ sumption that their biases w i l l cancel out. ( O n the other hand, ten b l i n d people see no more clearly t h a n does one b l i n d person.) The tactics used to reduce variability within observers also reduce vari­ ability from bias among observers. I n the thermometer case, such tactics m i g h t include careful instructions, devices that p u t the tall and short men s chins on the same bar, and automatic reading w i t h instruments. Most ob­ server bias i n the social sciences is not dealt w i t h so easily, however. ( T h i s is one reason w h y the design of research procedures is often more challenging i n the social sciences than i n the natural sciences.) Here are some tactics that prove helpful. First, t r a i n observers very carefully. I n a famous study of racial bias, G. AUport f o u n d that many whites w h o were shown a picture of a w h i t e man and a black man later said that the black was h o l d i n g a razor, even though i t was the w h i t e w h o really h e l d the razor i n the picture. The bet­ ter trained the observer, the less likely he is to make such biased observa­ tions. Careful t r a i n i n g is the keystone of m o d e m anthropological method. Second, specify the observer's task as closely as possible, to reduce the area of discretion w i t h i n w h i c h bias may operate. As w i t h observer variabil­ ity, an observer's bias i n estimating family income can be controlled by having h i m count the age and number of cars owned or the number of rooms i n tlie house. T h i r d , require observers to refer frequently to detailed instructions. This requirement is a sort of training, for i t forces the observer to retrain himself by constantly rereading his instructions. I t also reduces his area of discretion by reducing the chance that he w i l l drift away from his instructions. F o u r t h , require immediate and detailed r e p o r t i n g whenever possible. A n ­ thropologists t r y to record their field notes every day, to minimize the chance that their memories w i l l play biasing tricks u p o n them. Police officers are also trained to take on-the-spot notes, to prevent bias and inaccuracy from creeping i n , and courts give special attention to such notes.^ F i f t h , mechanical devices like the camera and the tape recorder can some­ times reduce the discretion of the field observer. The permanent record can be checked later against the observer's observations, to evaluate bias. This technique has been used extensively i n anthropology, market research, and social psychology. The coming of recording devices drastically altered the w o r k i n g habits of anthropologists ( M e a d , pp. 5 5 - 5 6 ) . Sixth, have several observers observe the same phenomena, and compare their observations. Note that often all observers must observe at the same 1. J. Hulett advises us that a stubby pencil and a small battered notebook make people less nervous than do more pretentious tools (p. 364).

Obstacles Created by the Humanness of the Observer

279

time, because, for example, an interviewee may answer differently or not at all i n a second interview. Complete d u p l i c a t i o n is usually too expensive, especially i n interview^ studies; i n fact, the cost of i n t e r v i e w i n g is the basic l i m i t a t i o n on the size of the study. C o m p a r i n g the data from several ob­ servers to establish their i n d i v i d u a l biases is usually feasible only on a small part of the study; those biases can then be taken into account w h e n the rest of their data are analyzed. Such a comparison can also test whether the observers have been trained w e l l enough to squeeze their biases out of them. A n example is a study i n w h i c h t w o well-trained psycholinguists counted the number of pauses of various kinds i n a recorded speech. Both of them practiced and compared their counts u n t i l these practice counts agreed w i t h each other ( M a c l a y h Osgood). T h e n each could observe i n d i ­ v i d u a l l y w i t h o u t fear of undue bias creeping i n . I t is often wise to select observers w h o differ on relevant dimensions, for example, foreign-bom and native-born, o l d and young, w h i t e and black, and so on. This technique of comparing ratings among observers, and of averaging the ratings among obseiwers i f the ratings differ, can be especially powerful w h e n the observers themselves are part of the situation and therefore have strong emotional reactions that affect their judgments. F o r example, the average judgment among a sample of several enlisted men is likely to pro­ vide a less biased r a t i n g of an officer than w i l l that of just one enhsted man (Selvin, 1960, pp. 3 1 - 3 3 ) ; averaging samples of students and patients is likely to provide "better" judgments of teachers and doctors, respectively, than are judgments b y just one student and one patient. The previous examples show h o w observers can inject bias into w h a t they hear and see and also into their judgments of w h a t they hear and see. But bias can also come from the way that the researcher himself affects the h u m a n subjects, as we shall see shortly.

3. C h e a t i n g b y Interview^ers Interviewers have found many ways to avoid their assigned duties ( R o t h ) . Some of these deviations are not larcenous; for example, the interviewer may merely skip an interview because the house does not look pleasant from outside. B u t some of the deviations are pure theft; the interviewer may fill out the questionnaire schedules i n her armchair at home, and, worst of all, she may do i t w i t h such skill that the answers are hard to distinguish from the real t h i n g w i t h o u t some independent check. ( A n experienced survey analyst claims, however, that, even though he cannot detect one phony interview, he can tell w h e n a whole group has been falsified b y examining the pattern of dispersion among the answers. I hope he is right. ) Prevention and detection of interviewer cheating require a detective's imagination and resourcefulness. Here are some frequently used techniques. First, the study supervisor may check to see whether the interviewer really carried out the i n t e r v i e w b y contacting some or a l l of the subjects by

280

The Obstacles to Social-Science

Knowledge

telephone or postcard. This tactic is not feasible i f the respondents are guaranteed anonymity. Social scientists do not, however, always act accord­ i n g to the letter of their guarantees and often use key numbers on question­ naires to check against master lists. One hopes that such ethical breaches do not have serious consequences. Second, someone once t o l d me of a mechanical device that rolls the ques-

FIGURE

19.2

* lUTlMATt WEAR HARKETma RESEARCH BlVlStOKi

Source: Reprinted from Marketing News published by the American Marketing Association. June 4, 1976, p. 11. Used with permission.

Obstacles Created by the Humanness of the Observer

281

tionnaire past a w r i t i n g w i n d o w , one question at a time and irreversibly. I t prevents the interviewer from skipping some of the questions and later filhng them out at home. T h i r d , examination of the answers and analysis of their internal consis­ tency can reveal cheating; this is dramatized i n Figure 19.2. Finally, some researchers claim to have developed special cheat-catcher questions, b u t they are kept as trade secrets. I f y o u detect cheating, find out how much cheating there is, and report i t i n your w r i t e - u p , even though unsophisticated readers of your study or critics w i t h vested interests may seize u p o n this report as a means of d a m n i n g the study. I n an example of this procedure, F. Evans (1959) f o u n d no i m p o r t a n t personality differences between F o r d owners and Chevrolet owners. Commercial m o t i v a t i o n researchers raised a h o w l because this find­ i n g threatened their livelihoods, and one of the m o t i v a t i o n researchers seized u p o n Evans' report that one of his interviewers had c h e a t e d - w h i c h Evans h a d allowed for b y discarding those data—to discredit the study. B u t sophisticated readers gained confidence from his report because they were reassured that there was no unreported b u t worrisome cheating.

4. V a r i a b i l i t y A m o n g Observers V a r i a b i h t y among observers is not a special type of diflficulty. Rather, i t is a combination of variability within i n d i v i d u a l observers and systematic bias differences among observers. A v o i d i n g these t w o sources of error separately automatically solves the problem of variability among observers.

5. O b s e r v e r - C a u s e d Effects The researcher's efforts to study a phenomenon always affect the phe­ nomenon and change i t . The observer is inevitably a part of the same environment as is the phenomenon she is studying. Therefore, the observer, like all other aspects of the environment, must influence the phenomenon. Sometimes the effect is so slight that i t may be ignored, as is usually the case i n the physical sciences. For example, the act of measuring the time i t takes a b a l l bearing to r o l l d o w n an incline affects the b a l l i n some infinitesimal way; there is gravitational attraction between the observer's eyeballs as they r o l l i n her head and the ball bearing as i t rolls d o w n the incline, b u t the attraction is too small to measure. O n the other hand, a chemist's breath may w e l l affect a reaction he is r u n n i n g . The observer effect i n medical examinations straddles the physical and social sciences. W h e n a doctor ( o r especially a young nurse, i n the case of male patients) takes a patient's b l o o d pressure, fear or excitement often forces the b l o o d pressure far above the normal level. M o r e generally, there is the possibility that every sort of observation method, whether or not i t involves a human observer, may i m p o r t a n t l y

282

The Obstacles to Social-Science

Knowledge

influence the behavior of the subject matter. The most general remedy for this obstacle is to make the observation procedure as "unobtrusive" as possi­ ble (see W e b b et al). This topic w i l l be discussed i n more generality on page 334. The danger of observer-caused effects is that the researcher may not realize that they are operating and affecting the results. I n I . Pavlov's case: I t was thought at the beginning of our research that it would be sufficient simply to isolate the experimenter in the research chamber with the dog on its stand, and to refuse admission to anyone else during the course of an experi­ ment. But this precaution was found to be wholly inadequate, since the experi­ menter, however still he might try to be, was himself a constant source of a large number of stimuli. His slightest movements—blinking of the eyelids or movement of the eyes, posture, respiration, and so on—all acted as stimuli which, falling upon the dog, were sufficient to vitiate the experiments by making exact interpretation of the results extremely difficult. I n order to exclude this undue influence on the part of the experimenter as far as possible, he had to be sta­ tioned outside the room in which the dog was placed. . . . (Pavlov, pp. 108-109) As H . Spencer pointed out l o n g ago ( p p . 6 6 - 6 7 ) , observer-caused effects make trouble i n the social sciences because h u m a n beings are b o t h the subject of study and the observers. The consequent interaction between subject and observer must have consequences. W e humans spend large parts of our lives learning to pick up and act u p o n subtle cues given us b y people w i t h w h o m we are i n contact. Therefore, even very subtle behavior of the observer can affect the study results. For example, the inflection and tone of voice i n w h i c h an interviewer asks a question can produce one response or another. A serious charge against the Kinsey study was that the subjects t o l d the interviewers w h a t they thought the interviewers w a n t e d to hear or w h a t they thought w o u l d shock the interviewers. B u t Kinsey's cross­ checking techniques satisfied the official reviewers that error from this source was not great (Cochran, et al.). One m i g h t argue that i n history, economics, or political science there are no observer-caused effects because the research generally takes place after the events are complete. Brutus k i l l e d Caesar no matter w h a t the historian says about i t now. B u t m i g h t not Brutus' act have been affected b y h o w Brutus thought that future historians m i g h t interpret and judge w h a t he did? Furthermore, Brutus' behavior was probably influenced by bystanders, w h o were really the observers of the event and the sources of the accounts that historians use. A n d i n economics the course of economic events is clearly affected b y the k i n d of records that are kept, w h i c h are the raw material for the economist later. Someone has argued that Gennany was more injured i n W o r l d W a r I I b y unsatisfactory national accounting than b y its off shortage. W e shall consider five kinds of observer-caused effects i n social science: first, interviewer effects u p o n interviewees; second, effects of p u b l i c i t y about

Obstacles Created by the Humanness of the Observer

283

study findings; t h i r d , time-sequence and repelition effects; fourth, placebo effects; and fifth, experimenter effects.

a. EFFECTS OF INTERVIEWERS UPON INTERVIEWEES

Interviewers' biased perceptions of interviewees and their responses can cause trouble, as we saw earlier. But, even i f the interviewer is perfectly unbiased, the interviewee may be affected by her. I n both instances, the nature of the observer or her action affects the observation and creates an obstacle. W h e n the observer actually causes change i n the phenomenon she is studying, we call the change an "observer-caused effect." B u t i f only what the observer sees and hears is affected, w^e call i t "bias." Some people may not care to make the distinction between the t w o diflBculties. . . . [ A ] whole series of studies shows that survey results for specialized atti­ tudes are affected by the disparities or similarities in the group membership of interviewer and respondent. For example, in two NORC surveys samples of Christian respondents in New York City were asked whether Jews in America had too much influence in the business world. Among those who were inter­ viewed by Christian interviewers, 50 percent said the Jews had too much in­ fluence, but among those interviewed by Jewish interviewers, only 22 percent said so. I n another survey in which respondents were asked whether they agreed with the statement "Prison is too good for sex criminals; they should be pubHcly whipped or worse," among women respondents who were interviewed by men interviewers 61 percent agreed with this statement; whereas when women were interviewed by women interviewers, only 49 percent agreed. I t would seem either that women are less bloodthirsty when they are in the company of their own species or, put more precisely, that they feel more compelled to give the conventional and sanctioned attitude to a male interviewer. I n another survey, in which one group of Negroes was interviewed by white interviewers and an equivalent group by Negro interviewers, similar effects were observed. For example, when asked whether the Army is unfair to Negroes, 35 percent of those interviewed by Negroes said "Yes," but only 11 percent of those inter­ viewed by whites were willing to express this critical attitude. I t is well docu­ mented that responses vary with the disparity between interviewer's and re­ spondent's sex, class, color, religion, and other group-membership factors. And the systematic direction of these effects is such that one would not attribute them to mere unreliability but to the way in which the respondent alters his behavior in accordance with the kind of person who speaks to him. (Hyman, 1954, pp. 517-518) Techniques for overcoming this obstacle are m u c h the same as the tech­ niques for s u m i o u n t i n g observer bias. For example, observer bias i n ques­ tionnaire surveys is often reduced b y handing the subject a w r i t t e n question list. Construction of unbiased questions is a similar problem that we shall discuss later. The participant-observer method i n anthropology is fraught w i t h the diffi­ culty of observer-caused effects, and i t is not easy for the researcher to

284

The Obstacles to Social-Science

Knowledge

decide h o w to act. W . W h y t e experienced these difficulties i n an Italian ghetto i n N e w E n g l a n d : I had to face the question of how far I was to immerse myself in the life of the district. I bumped into that problem one evening as I was walking down the street with the Nortons. Trying to enter into the spirit of the small talk, I cut loose with a string of obscenities and profanity. The walk came to a momentary halt as they all stopped to look at me in surprise. Doc shook his head and said: "Bill, you're not supposed to talk like that. That doesn't sound like you." I tried to explain that I was only using terms that were common on the street corner. Doc insisted, however, that I was different and that they wanted me to be that way. . . . While I sought to avoid influencing individuals or groups, I tried to be helpful in the way a friend is expected to help in Cornerville. When one of the boys had to go downtown on an errand and wanted company, I went along with him. When somebody was trying to get a job and had to write a letter about himself, I helped him to compose it, and so on. This sort of behavior presented no prob­ lem, but, when it came to the matter of handHng money, it was not at all clear just how I should behave. Of course, I sought to spend money on my friends just as they did on me. But what about lending money? I t is expected in such a dis­ trict that a man w i l l help out his friends whenever he can, and often the help needed is financial. I lent money on several occasions, but I always felt uneasy about it. Naturally, a man appreciates it at the time you lend him the money, but how does he feel later when the time has come to pay, and he is not able to do so? Perhaps he is embarrassed and tries to avoid your company. On such occasions I tried to reassure the individual and tell him that I knew he did not have it just then and that I was not worried about it. Or I even told him to forget about the debt altogether. But that did not wipe it off the books; the un­ easiness remained. I learned that it is possible to do a favor for a friend and cause a strain in the relationship in the process. (Whyte, pp. 304-305) A n d B. Malinowski, an early developer of the participant-observer method, pointed out: . . . [ I ] f , like a trader or a missionary or an official he [the anthropologist] enters into active relations with the native, if he has to transform or influence or make use of him, this makes a real, unbiased, impartial observation impos­ sible, and precludes all-round sincerity, at least in the case of the missionaries and officials. (Malinowski, p. 18)

b.

T H E " P U B L I C I T Y " O R " D I S C L O S U R E O F RESULTs"

EFFECT

The p u b l i c announcement of presidential preelection polls probably affects the election. O n the one hand, a p o l l result m i g h t persuade some people that i t is a waste of time to t h r o w away their votes on a candidate w h o is going to lose, and such a reaction m i g h t aid a candidate w h o m the p o l l shows to be ahead. This is an example of a "self-fulfilling prophecy" ( M e r t o n , p p . 179-195). Politicians believe i n this p o l l effect, j u d g i n g b y their screams at

Obstacles Created by the Humanness of the Observer

285

election time. P. Lazarsfeld, et al. (1948, pp. 107-108), d i d demonstrate that there is some "bandwagon effect"—people v o t i n g for the candidate they expect to w i n ; p o l l p u b l i c i t y could affect such expectations. B u t a p o l l m i g h t help a candidate w h o is shown to be losing narrowly, b y stirring his partisans to rise to the emergency. Perhaps such a se\i-defeating prophecy helped to elect T r u m a n i n the 1948 presidential election; the polls showed h i m slightly b e h i n d Dewey. The only evidence on actual election effects that I have seen is that voters w h o do not vote i n the m o r n i n g may be led not to vote at all by news reports of early election returns ( F u c h s ) . A forecast of economic inflation may magnify the inflation or even create an inflation that w o u l d not have occurred otherwise. People w h o hear the inflation forecast rush out to make purchases before prices rise. These pur­ chases then actually cause prices to rise. For a contrary example, a forecast of l o w corn prices may discourage some fanners from p l a n t i n g corn, w h i c h may i n t u r n raise the price of corn above the level i t w o u l d have reached w i t h o u t the forecast. U n l i k e most other research obstacles, p u b l i c i t y effects can easily be pre­ vented b y the researcher. All she has to do is refrain from disclosing the results u n t i l events have r u n their course. Some studies are done expressly to influence the outcome of the event. The study results are then publicized, w i t h h e l d , or distorted for tactical reasons. Such use of research is one government weapon against economic recession or inflation. I f people are t o l d that prices are coming d o w n , they may refrain from b u y i n g and w a i t for the lower prices. The fall i n purchas­ i n g may then drive d o w n the prices and curb the inflation. This phenomenon resembles "feedback," w h i c h is discussed on page 351. T r u e feedback, however, does not involve the observer b u t only the subject; something that the subject herself does affects her o w n subsequent be­ havior.

C. SEQUENCE A N D R E P E T I T I O N EFFECTS

One observation sometimes influences the next observation. The farmer can­ not afford to break open each egg he sells to see i f i t is fresh; the next person to examine the egg w o u l d then surely find i t worthless. Observations that require breaking the egg or the l i g h t b u l b or the clay pigeon are called "destructive testing," for obvious reasons, and they are one form of repeti­ tion effect i n w h i c h w h a t y o u do at one time affects the subsequent state of affairs. Here is an anonymous wag's illustration: You step on a man's toe and then apologize. H e accepts your apology graciously. I f you then repeat the experiment and step on his toe again, his second response may not be the same as his first. Sequence effects (also called "position" effects) are a frequent obstacle i n psychological studies of learning. W e k n o w that the first and most recent

286

The Obstacles to Social-Science

Knowledge

stimuli w i l l often be remembered best. For example, i f you ask a subject to memorize a list of nonsense syllables, the m i d d l e syllables w i l l not get a fair shake. Therefore, i f you w a n t to compare the memorization speed of one type of nonsense syllable w i t h the memorization speed of another type of nonsense syllable, you must somehow overcome this obstacle. A standard technique for avoiding sequence effects is to vary the se­ quence i n w h i c h the stimuli are presented to different subjects. I f there are only three stimuli, one group of subjects may be given stimuli A B C i n that order, whereas other groups receive the stimuli i n orders B C A , C A B , C B A , A C B , and B A C . The design is "balanced"; that is, each stimulus has an equal chance to be first, second, or t h i r d , and the average scores for the stimuli can be compared directly. I n other cases i n w h i c h every stimulus does not have an equal chance, one can determine h o w m u c h worse the m i d d l e ( a n d perhaps last) stimuli d i d and then make appropriate adjust­ ments. W h e n there are many different stimuli, one can present them to subjects i n random order. Repetition effects are particularly i m p o r t a n t i n panel studies^ and i n any research i n w h i c h subjects are observed or questioned more than once. Sometimes the subjects are not affected b y repetition; there is some evi­ dence that on consumer b u y i n g panels the subjects do not "wear out" very r a p i d l y (Sandage; Sobol). B u t several v o t i n g studies have shown differ­ ences i n the subjects' responses depending u p o n whether they had previ­ ously been asked about their v o t i n g intentions. L e t us consider these data on the rate of recall for advertisements of a group of women w h o were interviewed several times: RECALL AVERAGE First Interview Second Interview

19.1 24.9

"Respondents were not t o l d on the first i n t e r v i e w that they w o u l d be con­ tacted again. Yet their interest i n advertising was heightened . . . and their retention of commercial messages i n the magazine was vastly improved" (Pölitz, p. 7 ) . I n studies i n w h i c h there is danger of subjects' being sensitized b y earlier observations, the first step is to establish whether there really is a sensitiza­ t i o n effect. The simplest w a y to do so is to compare the second responses of one group against responses of a similar group that has not received a first treatment. I n panels, compare a particular set of responses of prior members of the panel w i t h parallel responses from n e w l y recruited panel members. I f there is no difference, then you can assume that sensitization is not taking place, and y o u can accept the panel data at face value (Sudman, 1966). 2. In panel studies people are observed or questioned at several different times. Panel studies are discussed in Chapter 20, which includes bibliographical references.

Obstacles Created by the Humanness of the Observer

287

I f significant sensitization does occur, y o u must forgo the repeat-interview or panel technique and employ some other method instead.

d.

PLACEBO EFFECTS

L o n g ago physicians observed that giving a patient a pill—ant/ p i l l , even i f i t contained only sugar—would often alleviate the symptoms of many dis­ eases. The effect is not l i m i t e d to those ailments usually regarded as psy­ chosomatic. I t includes angina pectoris, w h i c h is a type of heart disease; any new remedy seems to have the power to alleviate angina symptoms for six or eight months. This is the "placebo effect"; the fake medication is called a "placebo." Sometimes the fake medication really affects the patient physically. Other times the patient only thinks he is getting better and reports the alleviation of symptoms to the doctor even though his u n d e r l y i n g condition does not change. This can confuse the investigation of worthless drugs. W h e n m e d i ­ cal researchers experiment w i t h the chemical effects of a drug, they must allow for the possible placebo effect, i n order not to confuse the t w o . T h e placebo effect brings home to us that medical research is i n many ways a social science. I t shares many obstacles to knowledge w i t h other social sciences, and i t therefore uses many of the same research methods, w h i c h is w h y examples from medical research are included i n this book. One way to avoid the placebo effect is to give the medication w i t h o u t the patient's k n o w i n g that he is receiving it—dissolving i t i n food, perhaps, or m i x i n g i t w i t h some other medication that the patient already takes rou­ tinely. The "double b l i n d " experimental design is another method. I n the first period group. A receives a placebo that looks and tastes like the experimental medicine, and i n the second period group A is switched to the real medica­ tion. Group B starts w i t h the medication and then is switched to the sugar p i l l . Neither the doctor nor the patient knows w h o is getting the placebo; hence the name "double b l i n d . " N o t only does this design prevent the pa­ tient from being affected psychologically, b u t i t also prevents the doctor from reading the symptoms she expects to see. The amount of the placebo effect is then subtracted from the medication effect. One can then infer that any observed differences between the medication effect and the placebo effect may be accounted for b y the medication.

e. E X P E R I M E N T E R EFFECTS

The famous H a w t h o r n e effect is another illustration of h o w the researcher can u n w i t t i n g l y obscure the effect of the variable i n w h i c h she is interested. A group headed by E. M a y o (Roethlisburger & Dickson; Madge, Chap. 6) w o r k i n g at the H a w t h o r n e plant of Western Electric set out to determine the effects of variations i n the intensity of l i g h t and w o r k i n g hours on the p r o d u c t i v i t y of a group of women factoiy workers. To their surprise, they

288

The Obstacles to Social-Science

Knowledge

found that everything they tried—even worse h g h t i n g - s e e m e d to increase p r o d u c t i v i t y . T h e n an explanation d a w n e d u p o n the researchers: T h e i n ­ creases i n p r o d u c t i v i t y were apparently ( b u t perhaps not actually) caused by the attention p a i d to the workers as subjects of research.^ The H a w t h o r n e experimenter—attention effect—the subjects p e r f o r m i n g better ( o r worse) so as to please (or displease) the experimenter—is only one of many influences that the experimenter can u n w i t t i n g l y have u p o n the subjects i n the experiment. A n equally p r o m i n e n t p r o b l e m is that the sub­ jects may find out w h a t behavior the experimenter expects of them, and then perform as expected (see Rosenthal). This phenomenon is i m p o r t a n t b o t h for its effects u p o n research and its effects i n operating situations such as the classroom. A n example of expectancy effect i n an operating situation is an experiment where teachers were t o l d that certain children " w o u l d show unusual academic development" d u r i n g the school year, though the children so des­ ignated were actually chosen at random. T h e n the children were tested at the end of the year. T h e effect of the teachers' expectancies can be seen i n the results for grades 1 and 2 i n Table 19.1. The expectancies of an experimenter can be transmitted subtly, sometimes so subtly that they are difficult to detect. The story of Clever Hans, the horse w h o could do arithmetic, brings out the p o i n t : Clever Hans . . . was the horse of Mr. von Osten, a Cerman mathematics teacher. By means of tapping his foot, Hans was able to add, subtract, multiply, and divide. Hans could spell, read, and solve problems of musical harmony. Mr. von Osten . . . did not profit from his animal's talent, nor did it seem at all likely that he was attempting to perpetrate a fraud. He swore he did not cue the animal, and he permitted other people to question and test the horse even without his being present. Pfungst and his famous colleague, Stumpf, un­ dertook a program of systematic research to discover the secret of Hans' talents. Among the first discoveries made was that if the horse could not see the ques­ tioner, Hans was not clever at all. Similarly, if the questioner did not himself know the answer to the question, Hans could not answer it either. Still, Hans was able to answer Pfungst's questions as long as the investigator was present and visible. Pfungst reasoned that the questioner might in some way be signal­ ing to Hans when to begin and when to stop tapping his hoof. A forward inclination of the head of the questioner would start Hans tapping, Pfungst observed. He tried then to incline his head forward without asking a question 3. Here is a digression on the history of social science that carries an important warning. For almost five decades the Hawthorne findings have exerted a powerful effect both on the social sciences—actually creating a whole new research tradition—and on American industrial relations. In 1967 Carey made a persuasive attack upon the Hawthorne work as a whole, arguing that its conclusions in no way follow from the actual data and that the work falls into error because it fails to confomi to elementary canons of scientific procedure. And he notes that similar questions were raised shortly after the first Haw­ thorne reports appeared but were ignored by most social scientists. Carey's charges have not been rebutted, which—together with the strong evidence he and others have offeredsuggests that the Hawthorne study conclusions are unfounded in fact. Yet they continue to be taught and quoted. (We can usefully continue to refer to Hawthorne effects here, however, whether or not they really exist in industry.)

289

Obstacles Created by the Humanness of the Observer TABLE 19.1 Teacher Expectancy Effects: Gain in IQ of Experimental over Control Groups (after eight months) Initial Ability Level Grades 1 2 3 4 5 6

Higher + 11.2 +18.2 -4.3 0.0 -0.5 -1.3

Average +9.6 -2.9 +9.1 +0.2 Not obtained +1.2

Lower +24.8 +6.1 -6.3 +9.0 + 1.2 -0.5

Weighted

Means

+15.4 +9.5 -0.0 +3.4 -0.0 -0.7

Rosenthal, p. 4 1 1 .

and discovered that this was sufficient to start Hans' tapping. As the experi­ menter straightened up, Hans would stop tapping. Pfungst then tried to get Hans to stop tapping by using very slight upward motions of the head. He found that even the raising of his eyebrows was sufficient. Even the dilation of the questioner's nostrils was a cue for Hans to stop tapping. When a questioner bent forward more, the horse would tap faster. This added to the reputation of Hans as brilHant. That is, when a large number of taps was the correct response, Hans would tap very, very rapidly until he approached the region of correctness, and then he began to slow down, ft was found that questioners typically bent forward more when the answer was a long one, gradually straightening up as Hans got closer to the correct number. For some experiments, Pfungst discovered that auditory cues functioned additively w i t h visual cues. When the experimenter was silent, Hans was able to respond correctly 31 percent of the time in picking one of many placards with different words written on it, or cloths of different colors. When auditory cues were added, Hans responded correctly 56 percent of the time. Pfungst himself then played the part of Hans, tapping out responses to ques­ tions with his hand. Of 25 questioners, 23 unwittingly cued Pfungst as to when to stop tapping in order to give a correct response. None of the questioners (males and females of all ages and occupations) knew the intent of the ex­ periment. When errors occurred, they were usually only a single tap from being correct. The subjects of this study, including an experienced psychologist, were unable to discover that they were unintentionally emitting cues. Hans' amazing talents, talents rapidly acquired too by Pfungst, serve to illus­ trate further the power of the self-fulfilhng prophecy. Hans' questioners, even skeptical ones, expected Hans to give the correct answers to their queries. Their expectation was reflected in their unwitting signal to Hans that the time had come for him to stop his tapping. The signal cued Hans to stop, and the questioner's expectation became the reason for Hans' being, once again, cor­ rect. . . . (Rosenthal, pp. 137-138, after Pfungst) There are a variety of strategies for reducing the danger of experimenter effects, as summarized i n Table 19.2. B u t sometimes i t is difficult to over­ come this obstacle. I n the H a w t h o r n e case, for example, researchers m i g h t have varied the intensity of the l i g h t w i t h o u t the workers' perceiving that they were the subjects of an experiment. B u t i t w o u l d have been next

290

The Obstacles to Social-Science

T A B L E 19.2

Knowledge

Strategies for the Control of Experimenter Expectancy Effects

1. Increasing the number of experimenters: decreases learning of influence techniques helps to maintain "blindness" minimizes effects of early data returns increases generality of results randomizes expectancies . . . permits statistical correction of expectancy effects 2. Observing the behavior of experimenters: sometimes reduces expectancy effects . . . facilitates greater standardization of experimenter behavior 3. Analyzing experiments for order effects: permits inference about changes in experimenter behavior 4. Analyzing experiments for computational errors: permits inference about expectancy effects 5. Developing training procedures: permits prediction of expectancy effects 6. Maintaining "blind" contact: minimizes expectancy effects [by avoiding feedback from experimenters and subjects] 7. Minimizing experimenter-subject contact [by using screens and automated data-collection systems] 8. Giving different expectancies to various experimenters: permits assessment of expectancy effects (Adapted from Rosenthal, pp. 402-404)

to impossible to vary the hours i n the w o r k day w i t h o u t the workers' real­ izing that they were being singled out for special attention. Some other techniques are therefore required. One way of allowing for observer interference when you cannot prevent i t is to vary the amounts and kinds of experimenter behavior, h o l d i n g all else constant, as suggested i n strategy 8 i n Table 19.2. A t H a w t h o r n e the ob­ servers m i g h t have varied the amount of time they spent w i t h the workers and whether they acted friendly or unfriendly t o w a r d the workers under well-controlled conditions. I f such variations on the part of the observer produced no differences, then i t w o u l d be safe to say that the observer is not an i m p o r t a n t source of variation. As a further control, i t is possible to b r i n g the observer into a situation i n w h i c h n o t h i n g else is changed from normal and then measure whether his presence alone w o u l d cause any difference. This is like g i v i n g one group no p i l l at all and the other group a placebo, to estimate the effect of the placebo alone. 6. S u m m a r y Observers of h u m a n behavior are themselves human, and hence are subject to h u m a n errors. These h u m a n errors are one of the m a i n obstacles that the researcher must overcome.

Obstacles Created by the Humanness of the Observer

291

Humans vary from moment to moment i n h o w they observe even the same plienomenon. This variabihty must be controlled as w e l l as possible w i t h instrumentation, instructions, and training. V a r i a b i l i t y among ob­ servers may be tackled i n similar manner. Observers often b r i n g their o w n biases to their work. These biases must be made to affect the observations as little as possible, and w h a t effect does occur should be evaluated and allowed for. Cheating b y observers is a problem to be tackled w i t h checkups and detective w o r k . N o w that you have read t h r o u g h this first of the chapters on the obstacles to knowledge and the devices for s u m i o u n t i n g them, you have perhaps concluded that the tactics for overcoming obstacles i n research are merely common sense. True. B u t the study of other peoples common-sense learn­ ing, as embodied i n this and other books on the subject, can save y o u time, money, and heartbreak. A m o n g the conditions that m i g h t be altered to influence experimenter expectancies are: descriptions of the subjects (e.g., as fast learners or slow learners); descriptions of the effectiveness of the experimental variable; and expectations of results predicted b v the theory ( Rosenthal, p. 404).

7. A p p e n d i x : P e r s o n a l I n t e r v i e w i n g a n d I n t e r v i e w s I n t e r v i e w i n g has always been the m a i n device for obtaining social, psycho­ logical, and economic information. A n d i t w i l l certainly continue to be i m ­ portant i n the future despite the g r o w t h of self-administered questionnaires, self-reports, and automated data collection. Therefore, a few words espe­ cially dedicated to the practice of i n t e r v i e w i n g seem i n order, though inter­ v i e w i n g is touched u p o n i n many other sections of the book. ( A review of the advantages and disadvantages of personal i n t e r v i e w i n g compared to telephone and m a i l interviews is given on page 318. ) Perhaps the most important element i n good i n t e r v i e w i n g is that the interviewer not influence the response that the interviewee gives, b y w o r d or gesture or general demeanor, as discussed i n this chapter. The interviewer must be friendly and pleasant so that the interviewee w i l l be w i l l i n g to cooperate. B u t the interviewer should not make the interviewee w a n t to answer i n such manner that he or she thinks w i l l please the interviewer. A n d , the interviewer must not send out signals that w i l l antagonize the interviewee into answering w r o n g l y or perversely as an attempt to foul u p the interviewer. T o w a r d the same end, the interviewee should not indicate i n any way w h a t answers the interviewer expects or wants to receive. I n a d d i t i o n to being pleasant and interested i n the interviewee, the inter­ viewer can increase the likelihood of obtaining cooperation i f she or he dresses more-or-less like the interviewee, though not i n a spectacular or "far out" manner.

292

The Obstacles to Social-Science

Knowledge

A good interviewer must k n o w the material thoroughly and follow i n ­ structions to the letter. Questions must be read exactly as they are w r i t t e n , i n the order i n w h i c h they are given, and the answers must be rendered faithfully, w i t h as little interpretation as possible. A l l this requires study and practice before going into the field. Sometimes interviewers are instructed to go beyond the w r i t t e n questions i n order to "probe" for answers and u n d e r l y i n g reasons. Good p r o b i n g re­ quires experience and tact. The w o r k of hired interviewers must be spot-checked by supervisors to ensure that the interviews were really done, and that the responses are those that the interviewee gave. Some interviewers are much better than others ( Sheatsley; Sudman, 1967, pp. 108-153). Interviewers w h o have w o r k e d at the occupation for a long time are—on the whole—faster, more efficient, and more accurate than newer interviewers. This may be because the poorer interviewers leave the occupation, because people learn, or b o t h . E d u c a t i o n and intelligence also are positively associated w i t h the quality and quantity of w o r k that interviewers produce. National O p i n i o n Research Center's demographic profile of the best interviewer is a married, m i d d l e aged woman, w i t h some college education and previous i n t e r v i e w i n g experi­ ence. Personality and m o t i v a t i o n also are i m p o r t a n t factors i n interviewer suc­ cess. This includes self-confidence, a positive attitude t o w a r d interviewing, and the desire to do a good job—none of w h i c h is easy to measure, however, and most of w h i c h w o u l d make the person an effective employee i n any position. Even more i m p o r t a n t than screening potential interviewers before h i r i n g them is to continually evaluate their performance after they have been hired. A n attentive, close-watching supervisor is the basic check. A d d i t i o n ­ ally, coders of the data can supply quantitative estimates of interviewer quality by recording interviewer errors. This is the error-weigh t i n g system used by N a t i o n a l O p i n i o n Research Center Type of Error 1. 2. 3. 4. 5. 6. 7.

Answer missing Irrelevant or circular answer Lack of sufficient detail "Don't know" with no probe Dangling probe Multiple codes in error Superfluous question asked

(Sudman, 1967, p. 109)

Error Weight 3 3 2 2 1 1 1

Obstacles Created by the Humanness of the Observer

293

EXERCISES 1. G i v e e x a m p l e s , f r o m a c t u a l r e s e a r c h in y o u r f i e l d , of t h e s e o b s t a c l e s : a. o b s e r v e r v a r i a b i l i t y b. o b s e r v e r b i a s c. v a r i a b i l i t y a m o n g o b s e r v e r s d. o b s e r v e r - c a u s e d effect on subjects e. p u b l i c i t y e f f e c t s f. s e q u e n c e o r r e p e t i t i o n e f f e c t g. H a w t h o r n e (placebo) effect 2. S u g g e s t m e t h o d s t o s u r m o u n t e a c h of t h e o b s t a c l e s in t h e e x a m p l e s

you

h a v e g i v e n in E x e r c i s e 1 .

ADDITIONAL

READING

FOR

CHAPTER

19

O b s e r v a t i o n a l m e t h o d s a n d p r o b l e m s are d i s c u s s e d by B i c k m a n ( C h a p t e r 8 i n S e l l t i z et ah). Denzin describes the sociological interview from the "interactionist" perspec­ tive (Chapter 6). W e b b et al. d e t a i l a g r e a t m a n y s t r a t e g i e s , w i t h a c c o m p a n y i n g e x a m p l e s , in t h e u s e of d e v i c e s t o s t u d y p e o p l e w i t h o u t t h e i r k n o w l e d g e a n d w i t h o u t interfering with their behavior. F o r a d i s c u s s i o n of h o w m e d i c a l r e s e a r c h e r s a r e s o m e t i m e s f o o l e d , s e e L o r a n g e r e f ah (1961). K e r n h ä u s e r a n d S h e a t s l e y (in S e l l t i z ef ah, p p . 5 6 3 - 5 7 4 ) d i s c u s s t h e a r t of i n ­ terviewing concisely. Additional useful w o r k s on interviewing: G o r d o n ; Kahn and Cannell; R i c h a r d ­ s o n ; S e l l t i z ef al. ( C h a p t e r 9 ) . T h e b a s i c w o r k o n i n t e r v i e w e r e f f e c t s a n d s e l e c t i o n of i n t e r v i e w e r s is H y m a n et al. (1975). S e e a l s o P a r t e n a n d P a y n e o n t h i s t o p i c , as w e l l as H y m a n (1950), p p . 5 1 9 - 5 2 3 . A n i n t e r e s t i n g a c c o u n t of i n t e r v i e w i n g , as p e r c e i v e d by t h e i n t e r v i e w e r , is t h a t of C o n v e r s e a n d S c h u m a n . R o s e n t h a l (1966) is t h e c l a s s i c w o r k o n e x p e r i m e n t e r e f f e c t s in research.

behavioral

ZD m m p l e x i t i e s a n d intractability of tlie h u m a n mind: appendix • n questionnaire construction 1. 2. 3. 4. 5. 6. 7. 8. 9.

Lack of Knowledge by the Subject The Fallibility of Memory Cover-Up Trying to Please the Observer Rationalization and Repression Deception A Brief Note on Behaviorism Summary Appendix: Questionnaire Construction

The obstacles to knowledge that are discussed i n this chapter are b o t h the bane and the joy of the social-science researcher—especially of the social scientist w h o works w i t h questionnaires. These obstacles are the salt of his work, the unique flavor that sets i t apart from other scientific work; the chemist or astronomer never faces these obstacles. There is an excellent and voluminous hterature on these matters, and y o u should read extensively i n i t if you plan to do w o r k i n social psychology, economic-data collection, mar­ ket research, anthropology, or related areas ( f o r example, Ferber & Wales; H y m a n , 1955).

1. L a c k of K n o w l e d g e by the Subject M u c h of your o w n behavior cannot be accurately reported b y you. I n some cases this is because you do not pay attention to the behavior. F o r example, if y o u were asked exactly h o w many pages y o u h a d actually opened a particular magazine to, y o u w o u l d have trouble answering accurately ten minutes later even i f y o u were p r o m p t e d b y being shown all its pages one by one. (One researcher overcame this obstacle b y p u t t i n g little dabs of l i g h t glue between each pair of pages of magazines, then examining the copies to see h o w many glue seals had been broken.)

Complexities and Intractability of the Human Mind

295

Another example is the number of television commercials y o u actually watch, compared to h o w many you switch stations to avoid or h o w many y o u tune out by leaving the room or attending to something else. One's selfreport is not likely to be accurate about this behavior. Steiner attacked this problem b y having observers surreptitiously w a t c h other members of their families d u r i n g commercials. Still another form of this obstacle arises i n o b t a i n i n g data on the com­ bined behavior of various people. Even i f I k n o w how many pages of a magazine I read, I do not k n o w h o w many other people have read how many pages of the same copy, even i f i t is m y family copy. One w a y to overcome this obstacle is to add u p the pages looked at by a l l people (actually, a sample of a l l people) and divide b y the number of copies. A more amusing technique was t r i e d about t h i r t y years ago. A researcher counted the number of fingerprints on magazine copies whose pages had been dusted w i t h special fingerprint powder. ( T h i s is another example of the intellectual kinship between detective w o r k and empirical scientific re­ search. ) Businessmen's lack of knowledge of their o w n behavior is an interesting example. T w o economists asked businessmen h o w they set prices, and many replied that they added fixed markups to their costs ( H a l l & H i t c h ) . B u t other analyses that assume prices are set according to w h a t the market w i l l bear explain actual prices better than do the businessmen. A p p a r e n t l y the businessmen fail to recognize w h e n they make adjustments to meet competi­ t i o n and otherwise maximize their profits. One can, however, ask diflFerent kinds of questions of the businessmen. A used-car dealer may be able to tell you very intelHgently and accurately w h y he set a price of $2000 on a particular two-year-old Chevrolet even i f he cannot tell y o u h o w he sets prices in general. Still another w a y is to infer his procedure from his answers to various relevant facts. I asked newspaper executives w h a t they thought their sales to t w o classes of customers w o u l d have been i f their prices had been 10 per cent lower or 10 per cent higher. T h e i r answers explained satisfactorily w h y one class of customers was charged more than the other (J. Simon, 1965d). A n anthropologist must ask "natives" the kinds of questions that the natives can answer knowledgeably, as B. M a l i n o w s k i points out: Exactly as a humble member of any modern institution, whether it be the state, or the church, or the army, is of it and in it, but has no vision of the resulting integral action of the whole, still less could furnish any account of its organi­ zation, so it would be futile to attempt questioning a native in abstract, sociological terms. . . . Though we cannot ask a native about abstract, general rules, we can always enquire how a given case would be treated. Thus for instance, in asking how they would treat crime, or punish it, it would be vain to put to a native a sweep­ ing question such as, "How do you treat and punish a criminal?" for even words

296

The Obstacles to Social-Science

Knowledge

could not be found to express it in native, or in pidgin. But an imaginary case, or still better, a real occurrence, will stimulate a native to express his opinion and to supply plentiful information. (Malinowski, pp. 11-12) Sometimes one group of respondents can answer w i t h more knowledge and accuracy than another group. W o r l d W a r I I questioning showed that men w h o had not yet been i n battle were most afraid of air attack, whereas men w h o h a d been into battle most feared 88 m m . artillery. To rely more heavily u p o n the answers of combat veterans seems sensible because their knowledge is greater (Stouffer, et al, I I , 235). 2. T h e F a l l i b i l i t y of M e m o r y One of the ways that we can learn about human behavior is to ask people h o w they have acted or w h a t has happened i n the past. B u t i t is not news that people sometimes forget things. I f you i n t e n d to use the products of people's memories as data, y o u must guard against the inaccuracy of their memories. A. Kinsey recognized that his subjects m i g h t have forgotten w h a t they had done i n the past, at w h a t age they had first done i t , and h o w often. T o estimate the importance of memory loss he checked up on subsamples of interviewees' memories b y various devices. One of these i m p o r t a n t devices he called "take and retake." Subjects i n the take-retake group were asked for their sexual history and then were i n t e r v i e w e d again many months later. The t w o interviews were then compared to see h o w w e l l the subjects' later statements checked against their earlier statements. Kinsey was able also to check the quality of subjects' memories by asking questions about their physical development (such as the age of g r o w t h of pubic h a i r ) and then comparing those answers against objective physical data. Physical checks of memory were obviously impossible for most parts of the Kinsey study, however. Kinsey's checks revealed that forgetting differed among types of ques­ tions. F o r g e t t i n g h a d little effect on the reported types of behavior people had engaged i n . I t had more effect on the reports of frequency of behavior. A n d there was most memory loss about the age at w h i c h subjects first engaged i n various types of behavior. Several devices offer some protection i f you t h i n k memory loss may be large. One m e t h o d of reducing memory loss is to have subjects keep diaries of their behavior, so that no remembering is necessary—a common practice i n tele vision-vie w i n g research and i n market research. No consumer w o u l d be likely to remember after a week just w h a t products he h a d bought the week before and h o w m u c h of each. B u t i f y o u get h i m to w r i t e d o w n his purchases immediately after he makes them, memory loss can be negligible. Mechanical counting devices are sometimes useful. For example, i f y o u w a n t to k n o w h o w often a person on a special diet thinks of food, y o u can ask her to click a pocket counter. Or, P. Lazarsfeld and F . Stanton devel-

Complexities and Intractability of the Human Mind

297

oped a "program analyzer" w i t h w h i c h a person registered his likes and dislikes of various parts of radio programs by pressing buttons. ( Stanton was president of CBS, and I wished he w o u l d have p u t a program analyzer i n my house. ) I n some cases y o u can circumvent people's b a d memories b y referring to existing records. For example, y o u can ask to see their bank books and other financial records instead of asking people h o w m u c h they have saved ( Ferb e r ) . I n other cases, you must substitute observation techniques for ques­ tion techniques. The Nielsen A u d i m e t e r is a mechanical device that auto­ matically records w h e n a television set is on and w h e n i t is off. A n d another device actually photographs people w a t c h i n g the television screen to mea­ sure w h o is actually watching, as w e l l as whether the set is on ( A l l e n ) . The type of forgetting euphemistically called "confusion" is a bugaboo of readership research. Show a person a magazine and ask h i m whether he has read i t . H e may say "yes" even though he has not read i t . Actually, memory ( or other faculties ) plays tricks on people, and they confuse w h a t y o u show them w i t h something else that they have seen. T o overcome this confusion, one group of people is shown a d u m m y magazine that has never been printed. The p r o p o r t i o n that says "yes" to this "placebo" is subtracted from the p r o p o r t i o n that says "yes" to the real magazine. Percentage who say they read actual magazine Percentage who say they read dummy magazine Percentage estimated to have read actual magazine

42% 12% 30%

Here is a general precept for dealing w i t h memory loss: I t is always good practice to reduce the time period between the event to be remembered and the i n t e r v i e w to the shortest possible interval. Perhaps the simplest memory obstacle to overcome is increasing fatigue i n the subject. For example, i n readership surveys, the interviewer shows the subject page after page of a magazine and asks whether the subject has seen the page. T o w a r d the end of the interview the subject is t i r e d and therefore less likely to remember an article or advertisement. I f the interviewer always started at the front of a magazine and w o r k e d t o w a r d the back, the pages at the back w o u l d always be tested on tired people and w o u l d therefore be underrated. Readership surveys overcome this difficulty b y starting at different places i n the magazine w i t h different respondents. I n this way, pages at the front, middle, and back have equal opportunities w i t h fresh and fatigued subjects. "Other things" are not equal for any individual sub­ ject. B u t for the group taken as a whole, all other things are indeed roughly equal for front, back, and m i d d l e advertisements and articles.

3. C o v e r - U p There are some factors that people do not w a n t to reveal to strangers ( even to intimate friends, or especially to intimate friends ) because of embarrass-

298

The Obstacles to Social-Science

Knowledge

ment, guilt, shame, or other social motives. The subject may therefore deny, exaggerate, minimize, or otherwise k n o w i n g l y operate upon the t r u t h to shape i t into a form that he feels is more acceptable. Barry Goldwater believed that this obstacle was operating strongly i n preelection polls i n 1964: Senator Goldwater's own polls show that President Johnson has a large lead over him. Sources close to Mr. Goldwater said last week he believes that if his position in the polls does not improve he will be politically dead. But the Senator is also said to believe that "subtle impulses," many of which the voters will not discuss candidly with pollsters or with any other questioners, are at work among voters this year and could bring him victory Nov. 3. (New York Times, September 4, 1964, p. 2E) Observation can sometimes operate as a check and prevent misinforma­ tion, as, for example, w h e n the subject claims a level of income quite incon­ sistent w i t h her home or job. Or the researcher can b u i l d into the interview internal checks that get at the t r u t h i n several different ways and therefore show u p inconsistent answers. The take-retake technique can also be used to check on h o w much cover-up is operating; as cross-examining lawyers know, falsehood can be revealed b y discrepant answers to repeated questions w h e n the subject forgets his original lie. The intuition of a highly trained interviewer can often distinguish t r u t h from falsity. Kinsey's interviewers, for example, were trained to recognize and use the vocabulary of prostitutes i n order to elicit responses that w o u l d help them to judge whether a w o m a n was really a prostitute. Kinsey relied heavilv upon this sort of abilitv i n his interviews. B u t each of the Kinsey interviewers had a f u l l year's t r a i n i n g i n these skills. There are few inter­ v i e w i n g staffs that can be so w e l l trained and upon whose intuitive capacity it w o u l d be sensible to depend to any great extent. 4. T r y i n g t o Please t h e O b s e r v e r People like to please other people, and subjects i n research studies are no exceptions. Interviewees often answer questions the way that they t h i n k the interviewer w o u l d like them to answer. A n d subjects i n experiments often act the way they t h i n k the observer wants them to act. This is very nice of them, b u t such genial behavior has a ruinous result unless the researcher does something about i t . Psychologists have given m u c h thought to protecting themselves from subjects w h o alter their normal behavior d u r i n g experiments. Psychologists often camouflage the experiment and its purpose so that the subject cannot k n o w w h a t the experimenter wants or expects. For example, C. H u l l w a n t e d to find out h o w people learn a concept or make a generalization w h e n they are not trying to find a generalization. B u t h u m a n beings are always t r y i n g to "solve the problem," especially i n psychology laboratories. H u l l therefore convinced the subjects that they were p a r t i c i p a t i n g i n a memory experiment,

Complexities ami Intractability of the Human Mind

299

and the task was to learn the names of various Chinese characters, whereas i n reality the subjects were gradually learning general rules about the names of types of characters. I f H u l l had not camouflaged the experiment, the subjects w o u l d have shown sudden learning rather than gradual learning of the concept. I am convinced that i t is the very fact of camouflaging or not camouflaging the purpose of the experiment that has caused some experi­ ments to obtain "discontinuous" rather than "smooth" gradual curves of learning concepts i n humans. Animals need no such camouflage because ( I t h i n k ) they do not have a whole set of rules of t h u m b to help them solve problems (Simon, 1953). I n questionnaire surveys, asking the questions i n an impartial w a y so that the interviewer's o w n prejudices are hidden can help to overcome the ob­ stacle of subjects' t r y i n g to please the interviewer.

5. Rationalization a n d Repression I n the section on cover-up, I discussed conscious distortion of the t r u t h . F r e u d convinced us, however, that m u c h lies i n the h u m a n m i n d that either comes into consciousness as a subconscious distortion of reality or else does not come into consciousness at all because i t is repressed. [W]hen a group of men who had said that they wanted to avoid serving in the Infantry were asked why, only 8 per cent said that this was because the Infan­ try "sees too much combat" or because "Its casualty rate is too high." The great majority indicated such reasons as " I don't think I'm physically qualified for it," " I t would not give me a chance to do the kind of work I can do best," and " I t would not give me training for a better job after the war." The analysts felt that, in a number of cases at least, such responses were rationalizations of the "true" motive—desire to avoid danger. I n an effort to determine the extent of such rationalizations, they studied the relationship between reluctance to serve in the Infantry or in overseas combat units and reported worries about battle injuries. They discovered that there was a marked relationship: the great major­ ity of those who said that they worried often about battle injury—79 per cent to be exact—wanted to avoid both the Infantry and overseas combat service. This is in contrast to 37 per cent of those saying that they never worried. Frequently a set of interlocking questions of this type permits an intrinsic check on eva­ sions, and their analysis leads to more convincing results. (Kendall & Lazarsfeld, p. 172) D i g g i n g out the real beliefs and reasons for behavior may involve the use of one or more of the variegated ingenious processes that psychologists have developed. A f u l l description of this armory of techniques for out­ w i t t i n g the unconscious may be found i n C. Selltiz, et al. (Chap. 8 ) .

6. D e c e p t i o n The patterns of inanimate nature are difficult to fathom. The quirks and twistings of the h u m a n m i n d add complication. A n d to top i t off, sometimes

300

The Obstacles to Social-Science

Knowledge

people t r y to deceive y o u for their o w n purposes. Deception is especially b a d because all the usual scientific weapons—random sampling, for ex­ ample—fail to help. The subject plays against you like an opponent i n a game, and she holds many of the w i n n i n g cards. There are many wonderful stories of deception by so-called " p r i m i t i v e " tribesmen w h o tell whoppers about their sex lives to anthropologists and then chortle among themselves at the stupidity of the "civilized" scientist w h o could believe such nonsense. These reports may be apocryphal; never­ theless, they are instructive. Tax data are always subject to deception. . . . [W]hen in 1711 a census was taken in China in connexion with the poll tax and military service, the total arrived at was 28 millions, but . . . when some years later another census was taken with a view to certain measures for the relief of distress, the total arrived at was 103 millions. (Carr-Saunders, p. A2) Kinsey's interviewers sometimes heard remarkable lies. Some businesses make a p o i n t of releasing false data about their operations, i n order to mislead their competitors. Other companies have gone to great lengths to prevent their competitors from gaining useful marketing information. The Federal Trade Commission record i n the Clorox case documented instances i n w h i c h Procter & Gamble concentrated unusual amounts of merchandising effort i n markets i n w h i c h their competitors were testing new marketing techniques, i n order to m u d d l e u p their competitors' research. I n another instance, one magazine sent out vast quantities of free magazines i n the weeks that a major readership survey was going on so that the apparent circulation of that magazine w o u l d be inflated relative to its competitors. Deception b y national governments to frustrate the intelligence-gathering services of other nations is standard practice. Some acts of deception—feed­ i n g w r o n g information to enemy agents, for example—are very direct. But nations also plant false stories i n their o w n newspapers to t h r o w enemy content analysis off the scent. Nations have falsified national budgets, gold reserves, and p r o d u c t i o n figures: " I t is reported that i n Russia i n the early 1930s the central statistical authorities h a d w o r k e d out l i e coefficients' w i t h w h i c h to correct the statistical reports according to regions, industries, etc." ( Morgenstern, pp. 20-21 ) . American ingenuity is sometimes used to leave an incorrect political i m ­ pression: Gordon D. Hall, a Boston lecturer on extremist groups of both right and left, disputes Birch Society semantics as well as statistics. He said he tried unsuc­ cessfully to get John H . Rousselot, the national public relations director, to accept a bet that the total membership in the nation was less than 25,000. How the society makes its members seem larger than they really are, said Hall, is shown by the response to an appeal by Welch for a letter-writing campaign against the Xerox Corporation of Rochester, N.Y. . . .

Complexities and Intractahilitij

of the Human Mind

301

As part of a promotional program in 1964, Xerox contributed $4 million to Tensun Foundation, Inc., without restrictions. The foundation produced a tele­ vision series on the United Nations, one of the targets of the Birch program. Because of the flood of mail, Xerox assigned a staff to catalogue the letters. A spokesman said this week that an analysis of 51,279 unfavorable letters had shown them to be written by 12,785 individuals. A l l of the 12,687 favorable letters received were found to have been written by 12,687 individuals. (Champaign-Urbana Courier, March 7, 1965, p. 1) A dentist t o l d the story of an experiment d u r i n g the Depression on the effect of an experimental toothpaste on bacterial concentration and tooth decay. D e n t a l students w h o needed money badly ate candy bars to increase their bacterial concentration, so that they w o u l d qualify as p a i d subjects for the experiment. After the experiment was over they q u i t eating candy bars, and their bacterial concentration naturally w e n t d o w n . B u t the experimenter thought that i t was his experimental toothpaste that had reduced the bacte­ rial concentration. This story illustrates h o w the medical sciences are troubled by the complexity of the human m i n d and h o w experiments are subject to deception, just as surveys are. There is no simple prescription for dealing w i t h deception. Most impor­ tant is to be alert to the possibility of deception; research is no place for a sweet belief i n the goodness of human nature. Aside from caution and skepticism, your best bet is to make independent checks of the evidence. Kinsey compared answers given by husbands and wives and by pairs of people w h o could provide information about each other, to determine the extent of deception. His basic technique to overcome deception was the free interview and the trained i n t u i t i o n of his interviewers, who could spot m u c h deception and refuse to be taken i n by i t . I f deception cannot be avoided or m i n i m i z e d , y o u may have to shift to w o r k i n g w i t h other subject matter that is less subject to this obstacle. There is no p o i n t i n d o i n g research that w i l l be w r o n g because you were fooled w h i l e acting i n good faith.

7. A B r i e f Note on B e h a v i o r i s m The workings of the h u m a n m i n d are h i d d e n from view. H o w , then, can we study a person's attitudes, desires, beliefs? One way to learn about mental processes is to assume that a person can really observe her o w n mental processes and report them accurately. This technique uses the subject as an instrument of the researcher to see w h a t the researcher himself cannot see, a technique that was the basis of the introspectionist school of psychology. M a n y psychologists became dissatisfied w i t h using introspective reports by subjects because of the many kinds of distortion and inaccuracy. They t u r n e d to observing the behavior of subjects and inferring mental processes from i t . The behavioral psychologist measures whether a person is h u n g r y

302

The Obstacles to Social-Science

Knowledge

by whether she eats, rather than by her statement that she is hungry. This technique has great advantages of precision because we can establish w i t h great accuracy whether eating takes place, whereas we cannot verify objec­ tively whether she is "really" hungr)^ Furthermore, this technique has the bonus advantage that i t allows us to study the hunger of animals, w h o m we cannot ask for verbal reports. The behaviorist technique consists of w a t c h i n g o u t w a r d behavior that can readily be observed and then assuming that the behavior is related to the inner states that are not visible for study. The advantages and disadvan­ tages of behaviorism i n psychology and related disciplines are a matter of hot dispute. I t is certain that there are research situations i n w h i c h observ­ ing behavior is the best possible approach. A n advertiser, for example, is supremely interested i n whether people really b u y his product; information on purchases is more valuable than any reports of such mental processes as attitudes, feelings, beliefs. O n the other hand, w a t c h i n g behavior provides far too little i n f o m i a t i o n to a psychiatrist w h o is t r y i n g to diagnose a case, though a good psychiatrist w i l l make the most of behavioral evidence too. W h a t is very clear is that, although studying behavior may often be the best way to infer knowledge about h o w people's minds work, i t is not always the best method for d o i n g so. 8. S u m m a r y The complexity of h u m a n responses is b o t h the source of our interest as social scientists, and an obstacle to learning about people and their behavior. The m a i n human-response obstacles that the researcher must contend w i t h are these: ( 1 ) Lack of knowledge b y subjects about w h a t they do and w h y they do i t . ( 2 ) F a l l i b i l i t y of people's memory about their past behavior. ( 3 ) Covering u p of information that subjects t h i n k shameful or do not w a n t to reveal. ( 4 ) T r y i n g to say and do things that w i l l please the observer. ( 5 ) Rationalization and repression. ( 6 ) Deceiving the observer about behavior, attitudes, and motives. For each of these human-complexity obstacles there is a w i d e variety of techniques to help you overcome them. M a n y of these techniques are de­ scribed i n this chapter of the book. Others are learned w i t h research experi­ ence. 9. Appendix: Questionnaire C o n s t r u c t i o n These are key elements i n sound questionnaire construction: 1) Keep your study's purpose clearly i n m i n d at all times. This w i l l help ensure that y o u ask all the questions you w a n t to ask, and leave out questions y o u don't need answers to. 2 ) Begin b y j o t t i n g d o w n the topics you w a n t information about, w i t h ­ out w o r r y i n g about w o r d i n g or logical order.

Complexities ami Intractahilitij 3)

4)

5)

6)

7)

8) 9) 10) 11)

of the Human Mind

303

N u m b e r the topics m a logical order, using these principles: a. To the interviewee, the organization of the questionnaire should seem sensible and smooth-flowing. b. I f some questions w i l l affect the answers to others, p u t the influenc­ ing question afterw^ards. c. I f there are some questions (such as income) that you may not get answers to, and that may antagonize some people, p u t them last. d. Put the least i m p o r t a n t questions near the end, i n case they don't get answered. W r i t e first approximations of the questions. Use simple language, make each question as short as possible, and use other techniques of effective w r i t i n g . Devising good questions takes experience and good judgment. Some of the specific obstacles you may encounter are dis­ cussed i n this chapter. For more detailed guidance, see Payne. Pretest the questionnaire by personally going out and asking the questions i n an "open-ended" fashion—that is, w i t h o u t a list of an­ swers among w h i c h the interviewee must choose. Talk to, say, ten friends and acquaintances ( w i t h w h o m you can feel comfortable) as w e l l as a few members of the target population. Tape-record some of the interviews i f you can. Rewrite ambiguous questions, reorganize the questionnaire where necessary, t h r o w out unnecessary or unsuccessful questions, convert some open-end to closed-end questions, and generally tighten up the questionnaire. A t t e n d to the length of the questionnaire: Telephone interviews must be short—say, five minutes maximum—and other i n ­ terviews are cheaper and more effective the shorter they are. W r i t e an introduction that w i l l persuade potential interviewees to participate. I f y o u just say, "Please answer this questionnaire," or " I need i t for a course," manv people w i l l t u r n y o u d o w n or t h r o w away a m a i l questionnaire—and there is little reason for them not to. But i f y o u tell people h o w their responses can help society or some particular group or themselves, or h o w the i n t e r v i e w w i l l be an inter­ esting experience, you'll get m u c h greater cooperation. I n some cases, i t w i l l be best to pay people or give them presents for being w i l l i n g to be interviewed. Pretest again. I m p r o v e the questionnaire again. Go into the field for part of the interviews. Check the p r e l i m i n a r y results. I f satisfactory, complete the work.

I t is w e l l to remember that no one ever becomes perfect at constructing questionnaires. H a n a n Selvin, most helpful and generous sociologist-editor of this book and a m a n of great survey experience, confessed that on a 1974 survey of the faculty of his university, he forgot to ask the respondent's sex, and i n 1975 he remembered sex b u t forgot to ask the respondent's race.

304

The Obstacles to Social-Science

Knowledge

There is no cure for such flaws except extensive pretesting. The experienced "pro" may get into trouble by figuring he or she is so good that pretesting can be skipped. Don't skip or skimp on pretesting; pretest again and again and again and. . . . As question worders we need to develop a critical attitude toward our own questions. We must check the tendency to accept the first wording that makes sense to us. We must subordinate any pride of authorship to this critical attitude and should try to substitute clarity for cleverness. Every objection that may be raised about the phrasing should be carefully considered, because that prob­ lem may occur many times over in the full-scale survey. If even a single test interview or comment from one of our associates implies any fault in the ques­ tion, that fault should not be passed over. How many people in the final survey will stumble over the same obstacle? The tendency to take things for granted is not easy to correct, simply be­ cause it is such a common characteristic of us all. (Payne, p. 17) There is a vast literature on the construction of appropriate questions and appropriate ways of asking questions (Payne; Hyman, 1954). Many ex­ amples show that small differences in the form of the question can make a big difference in the responses. S. Stouffer et al, concluded, on the basis of their vast experience in studving soldiers in World W a r I I , that "error or bias attributable to sampling and to methods of questionnaire administra­ tion were relatively small as compared witli other types of variation—espe­ cially variation attributable to different ways of wording questions" (Payne, p. 5 ) . The danger is greatest when one seeks to learn about people's atti­ tudes and opinions. Consider S. Payne's example of the differences in re­ sponse to the following questions: . . . Do you think most manufacturing companies that lay off workers during slack periods could arrange things to avoid layoffs and give steady work right through the year? 63% said companies could avoid layoffs, 22% said they couldn't, and 15% had no opinion. . . . Do you think most manufacturing companies that lay off workers in slack periods could avoid layoffs and provide steady work right through the year, or do you think layoffs are unavoidable? 35% said companies could avoid layoffs, 41% said layoffs are unavoidable, and 24% expressed no choice. (Payne, pp. 7-8) The form of the question is not as important when you ask for factual information. "What is the name of that traitor who is President?" may get practically the same responses as "Who is that great American who is Presi­ dent?" Sometimes researchers waste their own and other people's energy in

Complexities and Intractability of the Human Mind

305

n i t - p i c k i n g discussions about w b i c h is the best question to use to ask a person's age or whether to ask the question at the beginning or the end of the questionnaire schedule. The f o n n of the question, however, can be enormously i m p o r t a n t i n supposedly factual questions too, as this incident i n the measurement of the labor force demonstrates: Prior to July 1945 a single question was used. I t asked, Was this person at work on a private or government job last week? Beginning in that month, two ques­ tions were substituted. The first of these merely asked what the person's major activity was during the preceding week. I f the major activity was something other than working, the enumerator then asked whether in addition the person did any work for pay or profit during the week. The upshot of this change in questions was that in the trial when both versions were used, the new questioning showed an increase of 1,400,000 employed per­ sons over the old wording. About half of these additional workers had worked 35 or more hours during the week under consideration! (Payne, p. 11) Payne argues that the critical issue i n designing questions is "to make sure that the particular issue w h i c h the questioner has i n m i n d is the particular issue on w h i c h the respondent gives his answers" ( p . 9 ) . T r y i n g out your questions to see h o w they actually w o r k i n a pretest is equally important. If all the problems of question wording could be traced to a single source, their common origin would probably prove to be in taking too much for granted. We questioners assume that people know what we are talking about. We as­ sume that they have some basis for testimony. We assume that they under­ stand our questions. We assume that their answers are in the frame of reference we intend. Frequently our assumptions are not warranted. Respondents may never be­ fore have heard of the subject. They may confuse it with something else. They may have only vague ideas about it and no means for forming judgments. (Payne, p. 16) The scope of a question is important. Economics has shunned questionand-answer explanations of economic behavior because, I believe, the few attempts by economists have used questions of too w i d e a scope. They have asked businessmen " H o w do y o u set your prices for hquor?" They w o u l d do better to ask the narrower question " W h y is your price for Bonny Scotch $32 a case?" or, even better, " W h a t percentage w o u l d sales rise i f you raised the price of Bonny Scotch to $33?" I n general, w r i t i n g good questions is an art that usually requires imagina­ tion and experience. B u t m y favorite question comes from a college year­ book and is shown i n Figure 20.1. You can obtain some information b y asking for i t directly. B u t some information must be obtained by indirection. Income, for example, is a touchy topic, and i t is usually asked about b y showing the respondent a set of cards w i t h different ranges of income, e.g. $10,000-$15,000, and the sub­ ject is asked to p o i n t to the appropriate income range.

306

The Obstacles to Social-Science FIGURE

Source: Harvard tions, Inc.

Yearbook,

20.1

Knowledge Do You Drink?

1950; adapted by permission of Harvard Yearbook Publica­

O n some topics people hesitate to make revelations about themselves b u t are w i l l i n g to make statements about other people. I t may be reasonable to assume that such statements are projections of one's views about oneself. For example, R. Simon and I wished to determine the likely effect of possi­ ble government subsidies and taxes on people's fertility behavior. People may believe that i t is i m m o r a l or repulsive to have, or to not have, children for pay. They may resist p l a c i n g a monetary value on unborn or alreadyb o r n children. Therefore, people may not reply candidly to direct questions. Their verbal responses to direct questions may cover u p their feelings, be­ liefs, and intentions about their likely behavior. For that reason, we asked b o t h direct and projective questions. The projective questions were i n ­ tended to ease the sense of repulsion or embarrassment and to allow people to describe their o w n feelings b e h i n d a facade of impersonality. . . . First we asked:

Complexities and Intractability of the Human Mind

307

Please think for a moment about the average family in your neighborhood. How many children do most of the families in your neighborhood have before they stop having children? Respondents were then asked : I f the government were to give a monthly payment of $50—that is, $600 per year for each child after the second until that child is 18 years old—do you think the average family in your neighborhood would have more children than the (number answered in previous question) children they now have? (R. and J. Simon, pp. 586-587) Unfortunately there is almost no information on h o w w e l l projective ques­ tions really work, and under w h i c h conditions. Therefore, the result of a projective question must be buttressed w i t h other sorts of infomiation to support its standing as scientific evidence. Even t h o u g h y o u may do an excellent job i n preparing the questionnaire, some questions w i l l be more effective and easier to answer than others. I t is useful to k n o w w h i c h questions are superior, as an aid to evaluating your results. D . T w e d t (personal correspondence) suggests y o u ask each inter­ viewer to give t w o ratings to each question on the schedule, one for under­ standing and one for cooperation: "To the best of your memory, w h a t percentage of the respondents, understand the question, and w h a t percent­ age cooperated w i t h truthful, considered answers?" Another approach is to ask each respondent to indicate for each question h o w strongly held is the judgment or opinion given. This provides t w o ratings for each question, one for context and one for intensity. The intensity measure is then used to w e i g h t the influence of the substantive answer i n the overall result.

EXERCISES 1. G i v e e x a m p l e s f r o m r e s e a r c h in y o u r f i e l d t o i l l u s t r a t e t h e f o l l o w i n g o b ­ stacles: a. l a c k o f k n o w l e d g e b y t h e s u b j e c t b. f a l l i b i l i t y o f s u b j e c t ' s m e m o r y c. c o v e r - u p b y t h e s u b j e c t d. trying to please the observer e. r a t i o n a l i z a t i o n o r r e p r e s s i o n f.

deception

2. S u g g e s t m e t h o d s of s u r m o u n t i n g t h e o b s t a c l e s y o u d e s c r i b e d

in E x e r ­

cise 1. 3. H o w c o u l d t h e d e n t i s t e x p e r i m e n t i n g w i t h t o o t h p a s t e ( p a g e 301) h a v e p r o ­ t e c t e d h i m s e l f a g a i n s t t h e p o s s i b i l i t y of h i s s u b j e c t s ' d e c e i v i n g h i m ?

308

The Obstacles to Social-Science

ADDITIONAL

READING

Knowledge FOR

CHAPTER

20

O n q u e s t i o n n a i r e c o n s t r u c t i o n , s e e S e l l t i z et ah ( C l i a p t e r s 9 a n d 10) a n d K o r n t i a u s e r a n d S h e a t s l e y (in S e l l t i z et ah, p p . 5 4 2 - 5 6 2 ) . P a r t e n is a n e x c e l ­ l e n t b o o k o n q u e s t i o n c o n s t r u c t i o n a n d q u e s t i o n n a i r e s , as is O p p e n h e i m . Q u e s t i o n n a i r e s f o r use in m a r k e t r e s e a r c h a r e d i s c u s s e d w e l l in B o y d et ah (Chapters 7 and 8). Projective and i n d i r e c t - q u e s t i o n m e t h o d s are d i s c u s s e d in S e l l t i z et ah ( C h a p t e r 10). F o r e x a m p l e s of t h e h y p o t h e t i c a l - q u e s t i o n t e c h n i q u e in e c o n o m i c s s e e G i l b o y , a n d S i m o n ( 1 9 6 6 b ) . P a u l d i s c u s s e s l i m i t e d - s c o p e q u e s t i o n s in a n t h r o p o l o g y , as d o e s M a f i n o w s k i , p p . 1 1 - 1 2 .

Z\ o b s t a c l e s t o obtaining adequate sutjject m a t t e r 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Bias in the Sample Nonresponse: Unavailability of Some Part of the Universe Inability to Experiment with the Subject Matter Flaws in Subject Matter and Unreliability of Data Shortage of Subject Matter High Individual Variability in the Data Too Much Subject Matter Invisibility or Inaccessibility of Subject Matter Interference with Subject Matter by Researcher Summary

I f a social scientist is to arrive at sound conclusions, he must have good statistical data i n sufficient quantities, good subjects for laboratory experi­ ments, or good respondents for surveys. Computer people have a pungent phrase to describe w h a t happens w h e n a researcher uses poor data i n p u t : "GIGO—garbage i n , garbage out." This chapter deals w i t h the obstacles to achieving good data—the obstacles of too m u c h and too little data, biased data, and various types of inadequate data.

1. Bias i n t h e S a m p l e This interchange between G. Shaw and F. Harris illustrates the problem of bias i n obtaining a fair sample of subject matter for observation: Shaw: The first thing we ask a servant for is a testimonial to honesty, sobriety and industry; for we soon find out that these are the scarce things, and that geniuses and clever people are as common as rats. Harris: The Enghsh paste in Shaw; genius is about the rarest thing on earth whereas the necessary quantum of "honesty, sobriety and industry" is beaten by life into nine humans out of ten. Shaw: I f so, it is the tenth who comes my way. (Harris, n.d., p. 358)

310

The Obstacles to Social-Science

Knowledge

L i k e Shaw and Harris, researchers w a n t to set forth a true description of an entire universe J Shaw and Harris d i d not go beyond their casual impres­ sions for information about the universe of interest. I n the social sciences, however, we often study the universe by t a k i n g samples from i t . T a k i n g a sample is something like looking at a distant scene through a telescope. The image of the universe that you obtain from the sample w i l l be a true image if your sample represents the entire population. But, i f some of the popula­ t i o n escapes your sample, the image that you see may be distorted, just as the image of a distant scene w i l l be distorted i f your telescope does not cover the whole scene. A sample i n w h i c h all of the population is not represented or is not represented fairly—without the researchers k n o w l ­ edge—is called a "biased sample." ( B u t do not leap to the conclusion that all biased samples are bad samples. "Bias" is a descriptive title, not a value judgment. ) Sampling bias is one of the t w o causes of a sample's not being a t r u t h f u l image^ of the universe w i t h respect to the characteristic i n w h i c h y o u are interested. I n technical terms, sampling bias is one of the t w o sources of difference betw^een the sample estimate and the population parameter. The other cause is sampling error, the difference between the sample and the universe that results from the workings of chance; for instance, just by chance the results of ten coin flips m i g h t be three heads or five heads or seven heads. Sampling error is discussed i n detail i n Chapters 9, 27, and 30. W h e t h e r a sample is biased depends u p o n tohich universe you want to describe. I f a newspaperman wants to determine the voting intentions of a city's population, a sample made up of the people he meets at a local bar is likely to be biased. B u t i f he wants to get the reaction of bar patrons to a rise i n hquor prices, a sample d r a w n from a local bar may be unbiased and perfectly appropriate. The danger, always, is that the biased sample may give y o u an image unlike the image of the universe that you w o u l d get i f you studied each and every member of the population. A n example is the data that are collected b y the American Newspaper Publishers Association on the amount of advertising i n U n i t e d States news­ papers. The data cover only 389 dailv papers i n 146 cities. The cities that are covered are the biggest cities. This means, on one hand, that the data cover 60 per cent of all daily circulation but, on the other, that they distort the picture b y o m i t t i n g more than 1,300 smaller newspapers. (Samples are often purposely distorted by the researcher, however, for reasons of efficiency. 1. "Universe" and "population" are synonyms in the language of scientific sampling. 2. The term "image" is picturesque but somewhat inaccurate. No sample is ever a per­ fect replica of the universe from which it is drawn, and in fact the sample can safely be very different from the universe in all characteristics other than those being studied. Furthermore, it would be rare for a sample and a universe to have exactly the same mean, say, just because of the workings of chance. To put it more correctly, an unbiased sample is one in which the estimates of the characteristics under study have "expected values" equal to the universe parameters.

Obstacles to Obtaining Adequate Subject Matter

311

Such a practice requires that the researcher know where the distortion is and that no part of the universe be omitted completely.) The most celebrated fiasco caused by a biased sample and its misinterpre­ tation was the Literary Digest polling debacle before the 1936 presidential election. The magazine's news stories convey the flavor of the event. August

22,1936 The Digest

PREsroENxiAL

POLL

IS O N !

Famous Forecasting Machine Is Thrown Into Gear for 1936 The 1936 nation-wide Literary Digest Presidential Poll has begun. Unruffled by the tumult and shouting of the hottest political race in twenty years, more than 1,000 trained workers have swung into their accustomed jobs. While Chairmen Farley and Hamilton noisily claim "at least forty-two States," and while the man in the streets sighs "I wish I knew," the Digest's smooth-running machine moves with the swift precision of thirty years' experience to reduce guesswork to hard facts. This week, 500 pens scratched out more than a quarter of a million addresses a day. Every day, in a great room high above motor-ribboned Fourth Avenue, in New York, 400 workers deftly slid a million pieces of printed matter-enough to pave forty city blocks-into the addressed envelopes. Every hour, in the Digest's own Post Office Substation, three chattering postage metering machines sealed and stamped white oblongs; skilled postal employees flipped them into bulging mail-sacks; fleet Digest trucks sped them to express mail-trains. Once again, the Digest was asking more than ten million voters-one out of four, representing every county in the United States-fo settle November's election in October. Next week, the first answers from those ten mflfion will begin the incoming tide of marked ballots, to be triple-checked, verified, five times cross-classified and totaled. When the last figure has been totted and checked, if past experience is a criterion, the country will know to within a fraction of 1 per cent the actual popular vote of forty millions. As in former years, the Digest Poll will be marked by three distinctions: ImpartiaHty-A half century's reputation as the oldest and greatest news maga­ zine in the world precludes any thought of bias; the Digest has no stake in the outcome. . . . Accuracy-The Poll represents thirty years' constant evolution and perfection. Based on the "commercial sampling" methods used for more than a century by pubhshing houses to push book sales, the present mailing fist is drawn from every telephone book in the United States, from the rosters of clubs and associations, from city directories, fists of registered voters, classified mail-order and occupational data. The list is being constantly revised, so that "dead" addresses do not appear. Fraud is impossible, since the ballots are almost as difficult to reproduce as

312

The Obstacles to Social-Science

Knowledge

Uncle Sam's currency. (Tho in fact, tlie attempted frauds have thus far been insignificant.) The "system" is both simple and complex. The master-list represents every voca­ tion, every voting age, every religion and every nationality extraction in the country. I t is constantly revolving, so that a certain percentage of names change with each poll. Thus new voters are included, as well as those who have changed their affilia­ tion between elections. Trained experts have been constantly keeping that master-list up to date, check­ ing names against the latest directories and organization rosters. Thousands of telephone books piled to the ceiling have been combed through for "revises," and new names have filtered in to keep the total at ten million. Soon, the first of the marked ballots will begin to trickle in, tho the tide reaches thousands upon thousands a day at its peak. August

29,1936 Digest

POLL MACHINERY SPEEDING

UP

First Figures in Presidential Test to Be Published Next Week I n election after election, as the public so well knows. The Literary Digest has forecast the result long before Election Day. For this journalistic feat and pubfic service it has received thousands of tributes during many years. To-day the praise is continuing. . . . The American Press, . . . " W i t h the advent of the Presidential election campaign comes The Literary Digest Poll—that oracle, which, since 1920, has foretold with almost uncanny accuracy the choice of the nation's voters. . . . " September

5,1936 FIRST VOTES IN

Digest's 1936

POLL

Scattering Returns From Four States Show Landon Leading Like the outriders of a vast army, the first ballots in The Literary Digest's great 1936 Presidential Poll march into the open this week to be marshaled, checked and counted. . . . More than 24,000 strong, they represent the initial returns from four StatesMaine, New fersey. New York and Pennsylvania. The Big Parade has started. Once again the Digest is making a country-wide test of political sentiment to find the answer to a national question. This time it is, "Who will win—Roose-

Obstacles to Obtaining Adequate Subject Matter velt or Landon?" . . . four States are represented—Maine, York, President Roosevelt's home State, and Pennsylvania.

313

New Jersey, New

In each of the four the early percentages are heavily in favor of Gov. Alfred M . Landon, Republican nominee. Maine gives him 1,831 ballots, to 522 for Roosevelt. New Jersey casts 2,660 votes for Landon and 1,621 for Roosevelt. I n all, these four States are represented with 16,056 ballots for Landon, 7,645 for Roosevelt and 754 for Lemke. October

31,1936 LANDON, 1,293,669; ROOSEVELT, 972,897 Final Returns in the Digest's Poll of Ten Million Voters

Well, the great battle of the ballots in the Poll of ten million voters, scattered throughout the forty-eight States of the Union, is now finished. These figures are exactly as received from more than one in every five voters polled in our country—they are neither weighted, adjusted nor interpreted. . . . November

14,1936 WHAT WENT WRONG WITH THE POLLS?

None of Straw Votes Got Exactly the Right Answer—Why? I n 1920, 1924, 1928 and 1932, The Literary Digest Polls were right. Not only right in the sense that they showed the winner; they forcast the actual popular vote with such a small percentage of error (less than 1 per cent in 1932) that newspapers and individuals everywhere heaped such phrases as "uncannily ac­ curate" and "amazingly right" upon us. Four years ago, when the Poll was running his way, our very good friend Jim Farley was saying that "no sane person could escape the implication" of a sampling "so fairly and correctly conducted." Well, this year we used precisely the same method that had scored four bull'seyes in four previous tries. And we were far from correct. Why? We ask that question in all sincerity, because we want to know. The Literary Digest prediction was not just wrong—it was so grossly w r o n g that i t is given credit for the magazine's going out of business not l o n g thereafter. The most obvious conclusion from this case is that a huge sample is no protection against error. The Literary Digest repeatedly t m m p e t e d that its sample was 10 m i l l i o n , one voter i n every four, w h i c h is very impressive to the layman. B u t w i t h a sample of only 10,000 the results w o u l d have been practically identical. A n d a well-selected sample of 10,000 or less m i g h t have obtained the correct results w i t h good accuracy.

314

The Obstacles to Social-Science

Knowledge

G. Gallup has the r i g h t to say w h a t went w r o n g w i t h the Literary poll, because he made a correct analysis before the election:

Digest

The reasons the Digest poll went wrong in 1936 are obvious to anyone who un­ derstands modern polling methods. The Literary Digest sent out its ballots by mail and, for the most part, to people whose names were listed in telephone directories or to lists of automobile owners. From the point of view of cross section this was a major error, because it limited the sample largely to the upper half of the voting population, as judged on an economic basis. Roughly 40 per cent of all homes in the United States had telephones and some 55 per cent of all families owned automobiles. These two groups, which largely overlap, con­ stitute roughly the upper half or upper three-fifths, economically, of the voting population. The Literary Digest in previous straw vote polls had sent post card ballots to the same groups, but the Digest did not reckon with two factors in 1936—the division of votes along income lines which began with Roosevelt's adminis­ tration i n 1932, and the substantial increase in the voting population which took place between 1932 and 1936. These new voters came predominantly from the poorest levels—from income groups which favored Roosevelt. The Digest not only failed to select a proper cross section, but the means by which the magazine reached voters—mail ballots—also helped to introduce error into the findings. Persons most likely to return mail ballots are those in the higher income and educational levels, and, conversely, those least likely to return their ballots represent the lowest income and educational levels. So, even if the Literary Digest had actually used lists of voters throughout the country as they did in a few cities, instead of names selected from telephone books and automobile fists, post card ballots would still have been responsible for a substantial error in the Digest's findings. The time factor also contributed to the Digest's downfall. The great bulk of ballots sent out were mailed in September. I t was, therefore, impossible to catch any change or trend in sentiment taking place during the last two months of the campaign. I n many elections, in fact in most, no trend develops during the course of a campaign toward either candidate. But in some elections there is such a change in sentiment and, therefore, polling must be so timed as to measure this trend from week to week, right up to election day. The fact that the Digest was using faulty methods was obvious long before the election revealed them. As early as July 14, 1936, two months before the Digest even began sending out its ballots, the American Institute [Gallup's organization] . predicted almost exactly the Digest's final figure and described in detail exactly what was wrong with the Digest's procedures. (Gallup, pp. 73-75)

Obstacles to Obtaining Adequate Subject Matter

315

I n summary, G a l l u p points out h o w the Literary Digest p o l l was biased. Gallup's o w n sampling scheme, however, is also not a random one. Rather, he samples from carefully selected quotas. Such a quota sample may or may not be accurate and unbiased, depending on the w i s d o m of the sampler. I t is only r a n d o m l y d r a w n samples ( p r o b a b i l i t y samples) that can offer a guarantee against bias. B u t i t is not always easy to d r a w a random sample even i f y o u understand exactly w h a t y o u are doing. Expense and energy often rule out true p r o b a b i l i t y samples and the researcher may compromise on the closest he can get to a probability sample w i t h i n his budget of time and energy. The most frequent compromise w i t h randomness i n the social sciences is the use of college students as the sampled universe w h e n the researcher really w o u l d like to study the universe of all people or w h e n the entire U n i t e d States is the "target universe." So many psychological and sociologi­ cal studies have been l i m i t e d to college students that some critics of the social sciences claim that we have no general psychology or sociology, but only a psychology and a sociology of American college students, and that we therefore have no psychological or sociological knowledge of people i n gen­ eral. L i m i t i n g a study to college students is not always bad. A psychologist who studies hunger contractions i n the stomach or psychophysical functions like hearing and seeing usually need not w o r r y that college students are much different on those characteristics than are other people, just as H . Ebbinghaus and I . Pavlov d i d not require representative samples for their experiments to be valid. B u t a study of v o t i n g behavior or sexual behavior is surely biased i f i t is l i m i t e d to college students b u t generalizes for all people. M a n y studies of sex behavior had been done before A. Kinsey's, but almost all had been l i m i t e d to college students, and therefore we really knew very little about the sex behavior of Americans i n general u n t i l Kinsey came along. Kinsey's sampling certainly was not perfectly random or un­ biased—for example, half the men i n the male survey l i v e d i n Indiana, and the lower-income and less-educated classes were very m u c h underrepresented—but Kinsey's study at least came m u c h closer to obtaining a fair sample of the universe than d i d any previous work. (As a matter of fact, Kinsey's first survey was l i m i t e d to w h i t e males i n the U n i t e d States; the second was l i m i t e d to w h i t e females.) The consulting statisticians who judged Kinsey's sampling techniques thought that Kinsey had gone far to get a fair sample. They still asked, however, that Kinsey take at least a small t r u l y random sample to validate his other data and to serve as a conceptual bridge to the other data (Cochran, et al.). The danger of sampling bias is not merely a theoretician's nitpick. I n addition to the Literary Digest case, you have also seen the example on page 268 of the different results i n estimates of magazine audiences obtained by three commercial services, all of w h o m were h i g h l y sophisticated and p a i d

316

The Obstacles to Social-Science

Knowledge

great attention to d r a w i n g fair samples. Some difference i n estimates w i l l occur because of the inevitable sampling error from small samples. B u t the differences i n this case were so great that the three organizations must have d r a w n their samples i n different ways, thereby i n t r o d u c i n g bias into the estimates.

2. Nonresponse: U n a v a i l a b i l i t y of Some P a r t of the U n i v e r s e The preceding discussion of sample bias emphasized that i f the researcher wishes to d r a w sound conclusions about a universe, he must study the whole universe or all its separate parts, although this investigation can be made w i t h samples. I f his study does not cover a l l parts of the universe, he cannot d r a w conclusions about the entire universe. I f y o u w a n t to study w o m e n Ph.D.s, y o u must study those women w h o are not now teaching as w e l l as those w h o are teaching, or y o u cannot describe the entire universe of w o m e n w h o have Ph.D.s. I f y o u w a n t to d r a w conclusions about news­ papers, you must have information about small-town papers as w e l l as about big-city papers. One w a y to bias a sample is to draw i t from the universe disproportion­ ately and then not make allowances; the Literary Digest example is out­ standing i n this respect. Another major obstacle goes b y the special name of "nonresponse bias." B y this name I refer to all situations i n w h i c h some part of the relevant universe ( a n d therefore some portion of a sample d r a w n from that universe) is not available for study. For example, some people ( m o r e at the beginning of the study t h a n later) w o u l d not allow themselves to be interviewed b y Kinsey. A n d some people d i d not answer the questions of the researchers w h o studied the effects of smoking ( U . S . Public H e a l t h Service, p. 113). Indeed, i n practically every significant study of h u m a n beings there w i l l be some people w h o refuse to cooperate. ( D o not condemn people w h o refuse to cooperate. Sometimes they have a perfectly good reason, one of w h i c h may be that i t is a nuisance.) I n addition, some members of the sample cannot be reached to ask for their cooperation. Critics of the Kinsey study delight i n p o i n t i n g out that all Kinsey's sub­ jects were volunteers. I t is common sense that people w h o are w i l l i n g to volunteer their sex histories m i g h t w e l l have different attitudes and sex histories than do people w h o are more reticent. A n d indeed, one study d i d show that people w h o are l o w i n self-esteem are b o t h more likely to refuse to volunteer for sex studies and more likely to have conventional sexual behavior ( M a s l o w and Sakoda i n H i m e l h o c h and F a v a ) . Kinsey data, too, show that the average sex outlet is lower i n groups from w h i c h Kinsey obtained 100 per cent response than i n his basic samples. This finding does indicate that data from volunteers w i l l lead to overestimates for the popula­ t i o n as a whole. ( B u t do not leap to conclusions about the overall v a l i d i t y of the Kinsey study u n t i l you have examined its methods in toto; see its Chap­ ter 3.)

obstacles to Obtaining Adequate Subject Matter FIGURE

GRIN AND BEAR IT

317

21.1

BY LICHTY

7s no wonder capitalist polls can't match ours for accuracy and dependability!... Imagine asking people instead of telling them!"

Source: From the Wall

Street

Journal.

Reprinted by permission of Herbert Goldberg.

Unlike the Kinsey study, smoking behavior is not a "hot" topic; smoking is not strongly tabooed i n our society. Even so, there may be a biasing rela­ tionship between refusal to respond to questionnaires, on one hand, and smoking behavior and health, on the other. People i n the hospital d y i n g of l u n g cancer or other diseases may lack the energy to answer a questionnaire, and the absence of their responses w i l l distort the total picture (U.S Public H e a l t h Service, p. 113).

318

The Obstacles to Social-Science

Knowledge

You are not necessarily i n trouble just because some people refuse to cooperate. Noncooperators may cause no trouble under some circumstances. I f y o u are sampling on a street corner to find out whether people prefer one type of chewy caramel to another type of chewy caramel, a passerby may refuse to t r y the caramel because she has just come from the dentist. Such a refusal is not likely to cause error (another example of the decisions one must make on the basis of common sense). Sometimes so few data are available that the study is impossible. A p o l l of the President's cabinet on its like or dislike of the President's spouse is not likely to t u r n u p many w i l l i n g subjects. H a v i n g 100 per cent of the subjects is not necessary; missing just a few subjects is seldom fatal. For example, i f you can obtain responses from 90 per cent of your sample i n a preelection preference p o l l and i f 65 per cent of your respondents prefer candidate A, you can be absolutely sure that candi­ date A w o u l d be preferred b y the majority even i f every single one of the unavailable subjects preferred candidate B. Unless y o u are t r y i n g to make a very subtle study, y o u need not w o r r y m u c h about error i n estimating the universe as a whole i f you can gain access to nearly all your subjects. Whether the nonresponders can cause error depends u p o n whether the reason for nonresponse or the characteristics of the nonrespondents are related i n any way to the information that y o u seek to collect. H a v i n g just come from the dentist is probably unrelated to whether a person w o u l d prefer one or another flavor of caramel, though i t causes nonresponse. B u t a surly unwillingness to express a political preference m i g h t w e l l be related to w h i c h candidate the nonresponder w i U vote for. For example, i f one of the candidates has expressed an aversion to pollsters, his followers may refuse to answer i n m u c h greater relative numbers than w i l l followers of the other candidate. Unless y o u have very strong reasons to believe otherwise, you should assume that the nonresponders do not come randomly from the population. Unfortunately for the researcher, the reasons for nonresponse seem to be related to almost everything. I f there are substantial nonresponse holes i n your data, y o u must employ remedial tactics so that y o u can make reliable estimates for the universe as a whole. Some of the possible tactics w i l l be described here. First, you may work harder to increase the number of subjects who will respond. One way is to use a "stronger" technique. For example, personal i n t e r v i e w i n g is a stronger technique than is the m a i l questionnaire; fewer people w i l l refuse to answer questions i n person than w h e n solicited b y mail. Another way to decrease the nonresponse rate is to increase the number of attempts to find the subjects and obtain cooperation. I n t e r v i e w surveys sometimes go back to "not at homes" as many as eleven times i n hopes of finding them i n . Or, i f a person does not respond to a mail questionnaire, you m i g h t telephone to ask h i m to fill out the m a i l questionnaire. I n des-

Obstacles to Obtaining Adequate Subject Matter

319

perate straits, you m i g h t ask a m u t u a l friend to intercede and persuade the subject to cooperate i f a single subject is crucial, as is sometimes the case i n poHtical-science studies. W h e n m a k i n g these secondary attempts to increase your subject coverage, you must be careful not to change the subjects' answers b y your further efforts. I f y o u annoy a subject, you may affect w h a t he says; he may give you a wrong-headed answer just to get r i d of you. Sometimes you can increase the response b y persuading potential respon­ dents to answer. I n some cases, you can persuade them to respond by asking them very nicely or b y showing them how their responses w i l l benefit them­ selves or others i n the long r u n ; this technique is the same one that adver­ tising copywriters or salesmen use to persuade people to buy. You can do better than just saying "Please answer this questionnaire for me." A related tactic is to have someone i m p o r t a n t w r i t e them to ask for cooperation, on the stationery of a prestigious institution. Another technique is to offer gifts or payment to the respondents. Psycho­ logical laboratories are accustomed to p a y i n g students by the hour for pro­ longed sessions. A n d market researchers often find that dimes, quarters, or dollars w i l l do the trick. Members of consumer panels are usually p a i d i n merchandise. I t is often surprising h o w m u c h effect a small inducement can have. I n a library study we attached a ballpoint pen that cost 3 cents to every other short questionnaire p u t into books i n the library stacks. T w i c e as many of the questionnaires that had had pens attached came back as d i d questionnaires that h a d not had pens attached. As always, one must balance the cost of the inducement against the gains. For example, fewer m a i l ques­ tionnaires are needed i f a small payment is offered. A second w a y to deal w i t h nonresponse is to assume that the subjects w h o m y o u cannot reach or w h o refuse to cooperate are similar to the people whose responses you do obtain. You can then estimate the entire universe on the basis of simple proportionality. Interpolation or extrapolation for periods of time about w h i c h we have no data is an example of such simple exten­ sions of data. I f we have data on Gross National Product for 1975 and 1976 and estimates of advertising expenditures for 1975 b u t not for 1976, we m i g h t estimate the 1976 advertising expenditures by this formula: GNP1976 _ GNP

1975 ~

? Advertising Expenditures 1975

I f the assumption of simple proportionality is not justified, your estimate w i l l be a b a d one. I n this example, i t is possible that advertising expendi­ tures decreased from 1975 to 1976, even though G N P increased. Your only weapon is good judgment and such supplementary checks as knowledge that advertising expenditures were a constant proportion of G N P i n prior years. You have no scientific guarantee that the estimation is not wrong. I n any case, you must make crystal-clear to your readers just h o w m u c h of the

320

The Obstacles to Social-Science

Knowledge

universe was not accessible to you, the assumptions that y o u made i n t a k i n g account of the nonresponders, and h o w m u c h effect the nonresponders m i g h t have on your findings i f they were radically different from the rest of your sample. T l i i r d , i f your nonresponse is substantial, i t is always wise to make a secondary investigation to find out whether the nonresponders differ from your original sample and, i f so, how. W i t h this information i n hand, y o u can make modifications that w i l l improve your extensions of the original data— or decide to give u p the study i f the problem is insoluble. Kinsey used many ingenious devices to determine h o w and i n w h a t ways the volunteers were different from non volunteers. One technique was to obtain 100 per cent coverage of several groups, i n c l u d i n g all those members w h o h a d originally dechned to volunteer b u t were w i l l i n g to change their minds w h e n strongly urged. Kinsey then compared the complete-coverage estimates against the estimates he would have made if he had only had the original volunteers available. The results were encouraging. The volunteers were not sufficiently different from the non volunteers to introduce major errors into the results. ( T h i s statement is oversimplified. For a f u l l discussion of this and other devices used to check on nonresponse, see the early chapters of Kinsey, et al [ 1 9 4 8 ] ) . M a n y different techniques have been used to surmount the nonresponse problem. The common ingredients are ingenuity and common sense. Nonresponse varies from place to place and group to gi'oup. The French are said to be reluctant to participate i n research studies. A n d many middleclass Americans are t i r e d of the repeated requests. I n r u r a l Thailand, how­ ever, i n a study that asked intimate questions about birth-control attitudes and practices, " O f the eligible respondents none refused to be interviewed. Completed interviews number 1,207" ( Peng, p. 1 ) .

3. I n a b i l i t y to E x p e r i m e n t w i t h the Subject Matter Experimentation has strong advantages over nonexperimental methods for research that aims to determine causality and as a device for overcoming many obstacles to knowledge. B u t an experiment, either i n real life or i n a realistic laboratory setting, is not always possible. For example, i t is not morally permissible for scientists to apply tobacco smoke to the lungs of nonsmokers to see whether cancer develops, w h i c h prevents them from m a k i n g an absolutely airtight case that smoking causes cancer. ( W i t h some other h u m a n diseases, however, our society has been w i l l i n g to sanction experiments on such volunteers as convicts. ) I f experimentation is impossible we must do the best we can w i t h evi­ dence that comes from observation of natural variations i n the independent variables. Sometimes these variations occur under conditions that are almost as neat as those the investigator w o u l d arrange. These situations are called by a variety of names—"quasi experiments," "ex post facto experiments,"

Obstacles to Obtaining Adequate Subject Matter

321

"natural experiments." The i m p o r t a n t characteristic of such quasi experi­ ments is that some force clearly unrelated to the dependent variable causes the variation i n the independent variable. Nature usually does not arrange her experiments as systematically as a researcher w o u l d , and therefore the evidence is harder to interpret. As i n the case of smoking and health, however, the natural evidence is often all we can get. Some people already smoke, and we must extract w h a t information w e can from the evidence that we obtain from smokers b y survey. I f she were m n n i n g an experiment on smoking, the researcher w o u l d constitute the smoking and nonsmoking groups randomly, that is, she w o u l d choose people to smoke or not smoke b y p u l l i n g their names out of a hat so that the only differences between the smoking and nonsmoking groups w o u l d be the chance of the lottery. I n real life, however, the very fact that some people choose to smoke and others choose not to smoke suggests that there are some differences between the groups to start w i t h . Those differ­ ences—rather than the cigarette smoke—might be responsible for the higher incidence of l u n g cancer and other life-shortening diseases i n the cigarettesmoking group. There are several ways to attack this obstacle to knowledge, none of them totally satisfactory. One w a y is to simulate the experiment y o u w o u l d like to do by r u n n i n g an experiment w i t h people and conditions as similar as possi­ ble to the real-life conditions that you are interested i n and cannot experi­ ment w i t h . This is an age-old device i n the engineering sciences, i n the form of mock-up experiments, w h i c h have been used i n engineering ever since Leonardo da V i n c i , and probably before, to determine the effects of various stresses. Using models Leonardo experimented w i t h pillars and beams of various sizes, and he developed basic structural formulas based on these experiments (Mason, p. 149). I n the life sciences, one often experiments w i t h animals i n lieu of people on the assumption that w h a t is true of animals is likely to be true of humans. I f , indeed, cigarette smoke causes cancer i n rats, then there is some reason to infer that the same effect may occur i n humans. B u t we can never be sure. On the basis of other evidence about the similarity of rats and humans, we must decide h o w relevant the rat evidence w i l l be i n this case. T o illustrate: Moscow, Oct. 10, 1964 (AP)—Soviet scientists have immunized animals against cancer virus, the ojfficial news agency Tass said Saturday in Moscow. I t quoted Nikolai Blokhin, president of the Soviet Academy of Medical Sciences as saying, however, that it was too early to tell whether the discovery had any practical application for treating human cancer patients. Psychologists simulate when, like Ebbinghaus, they r u n experiments w i t h nonsense syllables to learn about more complex learning situations. A n d economists sometimes organize laboratory experiments to simulate firms and consumers b u y i n g and selling i n markets.

322

The Obstacles to Social-Science

Knowledge

Games are also used b y sociologists to find out h o w people interact w i t h one another under various types of circumstances. The results of such games can be provocative and can provide clues to understanding behavior. B u t again the crucial question remains: Does the game resemble real life? The advantage of such simulation studies is that they make i t possible to w o r k w i t h sets of data that are too complex to handle mathematically. B u t a simulation can never be any better than the empirical data about individuals and relationships that are fed into i t . Unfortunately, i t is rare that one has very good data to use i n such studies—even i n advertising research, where there are mountains of existing data. Simulation is discussed at length on pages 215-217. One use of cross-cultural study is to replace experiments. For example, the smoking-and-health researcher looks for t w o countries that seem alike i n most respects, except that i n one country cigarette smoking was fortuitously introduced m u c h earlier than i n the other. The health records of the t w o countries can then be compared. I t is i m p o r t a n t to be sure, however, that there was not some special nonrandom reason that caused cigarettes to be introduced earlier i n one country t h a n i n the other; such a reason m i g h t itself be the cause of differences i n health i n the countries. Economists have had to develop many specialized statistical methods to use i n lieu of experiments because so many aspects of national economies are not subject to experimentation. A n economist cannot carry out a sys­ tematic and controlled program of experimentation w i t h the various factors that may affect the business cycle i n order to increase our understanding of w h a t causes boom and bust. Instead, his only alternative (except for mockexperiment methods) is to interpret the data that history has made available to h i m w i t h the aid of those statistical techniques called "econometrics," w h i c h squeeze the m a x i m u m amount of information out of available data. W h e t h e r A causes B or vice versa, A and B cause each other, or A and B are causally unrelated is, however, a major difficulty i n econometrics; i t could be resolved easily i f experimentation were possible. A n after-the-fact analytic device is seldom as strong as an experiment i n supporting a cause-and-effect argument. Nevertheless, i f enough evidence of different kinds is available and i f such an argument is constructed soundly, the conclusions can come very close to m a k i n g an airtight case for the existence of a causal relation­ ship. This sort of reasoning was the basis for the unanimous judgment of the Surgeon General's committee that smoking "causes" l u n g cancer and other diseases.

4. F l a w s i n Subject Matter a n d U n r e l i a b i l i t y of D a t a The botanist is sometimes t r o u b l e d b y specimens that are rotten, aged, atypical, or otherwise flawed. The historian spends m u c h of her time m a k i n g do w i t h unreliable or incomplete records because people i n earlier centuries

Obstacles to Obtaining Adequate Subject Matter

323

were not as conscientious about keeping records as we now are. The econo­ mist often faces the same diflBcuhy as does the historian because the r a w material of empirical economics is generally historical records—though often of the recent past. The historian generally works w i t h nonquantitative data and reports his results i n literary rather than statistical fashion. He can therefore bridge gaps i n subject matter w i t h his i n t u i t i o n or imagination. The economist, however, must have some k i n d of numerical data w i t h w h i c h to work, even i f he must construct and patch together series of data w i t h b a l i n g w i r e and chewing gum, as someone p u t i t . The sociologist, too, is often at the mercy of those w h o have created the demographic data he works w i t h . None of us is astonished to hear that there are errors i n survey data, even though the magnitude of the errors is often m u c h greater than we w o u l d have guessed. Once you read O. Morgenstern's classic book On the Ac­ curacy of Economic Observations, y o u w i l l no longer be the trusting soul y o u once were. I t h i n k i t is more surprising to learn that even the published results of laboratory experiments are far from totally reliable. For example, a reanalysis of the data from seven psychological experiments found gross errors i n three of t h e m ( W o l i n s , p. 6 5 7 ) . The clear m o r a l is that w h e n you use data that have been collected b y other researchers as your r a w material, you must take careful precautions to check the accuracy of the data. "Students have to be brought up i n an atmosphere of healthy distrust of the Tacts' p u t before them. T h e y must also learn h o w terribly hard i t is to get good data and to find out w h a t really is 'a fact' " (Morgenstern, p. 305). B u t all too often researchers close their eyes to this obstacle, assume that published data mean w h a t they seem to mean, and use the data uncritically. Here is the primrose path, dangerous for y o u and for the consumers of your research results. A famous economist recently was embarrassed p u b l i c l y b y inadvertently using data that had m i n o r errors i n i t . A n d consider the enormous differences between t w o p u b l i c l y issued statistics about the same thing, the movement of gold from France to E n ­ gland (Ferraris i n Morgenstern, pp. 139-140): Gold Movement from France to Britain (in millions of francs) According Export

1876-1880

to French Statistics

41.5

According Import

to

British

Statistics

94.4

Here are some tactics to help y o u avoid error. Find out exactly how the raw data were collected. Study the data-collec­ t i o n methods critically to see whether they make sense and fit your needs. Take n o t h i n g on faith. Sometimes you w i l l find incredible blunders that have gone undiscovered for a long time. I f insufficient description of the

324

The Obstacles to Social-Science

Knowledge

data has been pubhshed, w r i t e to the author and find out h o w he d i d the work. Sometimes such correspondence startles you b y revealing that some of the data that y o u assumed to be h a r d facts were really just estimates by the author. Printed data, like rumors, have the unfortunate property of gaining the appearance of reliability and respectability as they are successively quoted and go from hand to hand. Consider, for example, the three "authoritative" sets of data—none of w h i c h can possibly rest on very solid foundations—given i n Table 21.1. Whenever possible, cross-check available data against other sources of data. I f y o u are w o r k i n g w i t h Gross National Product estimates and i f y o u also have series of employment data available to you, see i f the t w o series bear a sensible relationship to each other. I f no other data are available, at least check the data against rules of t h u m b and common sense. I n the goldmovement example, Ferraris checked the British i m p o r t statistic against the French export figure. The disparity alerted h i m to danger, even t h o u g h he could not determine w h i c h figures were correct. For another example, a student of unemployment movements should consider the evidence from both the Bureau of E m p l o y m e n t Statistics series and the Bureau of Labor Statistics' Monthly Report of Labor Force. For example, notice i n Table 21.2 how the up-or-down directions of the t w o series have differed, as w e l l as the absolute size of the estimates. A . Schlesinger, Jr., illustrates h o w a m i l i t a r y historian and political analyst goes about cross-checking the available data: The [New York] Times on Aug. 10 described "the latest intelfigence reports" in Saigon as saying that the number of enemy troops in South Vietnam had in-

TABLE 21.1

Numbers of Speakers of Selected Languages in 1976 Information Please Almanac {1976, p. 414)

Arabic Bengali Bihari Cantonese English French German Mandarin Portuguese Russian Spanish Tagalog Tibetan

150 95 37 80 350 80 105 555 110 140 220 8 9

CBS News Almanac {1976, p. 762)

World Almanac {1976, p. 200)

100 110 20 300 75 100 110 200 200 10 4

SOURCE: Almanacs as indicated, 1976; suggested by Wallis and Roberts, p. 96.

125 123 22 47 358 90 120 650 124 233 213 21 7

325

Obstacles to Obtaining Adequate Subject Matter

creased 52,000 since Jan. 1 to a total of 282,000. Yet, "according to official fig­ ures," the enemy had suffered 31,571 killed in action in this period, and the infiltration estimate ranged from 35,000 as "definite" to 54,000 as "possible." The only way to reconcile these figures is to conclude that the Vietcong have picked up from 30,000 to 50,000 local recruits in this period. Since this seems unlikely—especially in view of our confidence in the decline of Vietcong morale —a safer guess is to question the wonderful precision of the statistics. Even the rather vital problem of how many North Vietnamese troops are in South Viet­ nam is swathed in mystery. The [New York] Times reported on Aug. 7: "About 40,000 North Vietnamese troops are believed by allied intelligence to be in the South." According to an Associated Press dispatch from Saigon printed in The Christian Science Monitor of Aug. 15: "The South Vietnamese Government says 102,500 North Vietnamese combat troops and support battalions have infiltrated into South Vietnam." "These figures are far in excess of United States intelligence estimates, which put the maximum number of North Vietnamese in the South at about 54,000." But General Westmoreland told his Texas press conference on Aug. 14 that the enemy force included "about 110,000 main-force North Vietnamese regular TABLE 21.2

Unemployment Series: Levels and Changes, 1946-1961 {in millions) Monthly Changes

Unemployment 1 CensusM.R.L.F. 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961

2.3 2.4 2.3 3.7 3.4 2.1 1.9 1.9 3.6 2.9 2.8 2.9 4.7 3.8 3.9 4.8

2

B.E.S. 2.8 1.8 1.5 2.5 1.6 1.0 1.1 1.1 2.0 1.4 1.3 1.6 2.8 1.9 2.1 2.5

3 CensusM.R.L.F. Monthly Averages 0.1 -0.1 1.4 -0.3 -1.3 -0.2 0.0 1.7 -0.7 -0.1 0.1 1.8 -0.9 0.1 0.9

% Difference in Series 5

4 Col B.E.S. -1.0 -0.3 1.0 -0.9 -0.6 0.1 0.0 0.9 -0.6 -0.1 0.3 1.2 -0.9 0.2 0.4

{2)-Col{1) Col {1)

X 100

21.7 -25.0 -34.8 -32.4 -52.9 -52.4 -42.1 -42.1 -44.4 -51.7 -53.6 -44.8 -40.4 -50.0 -46.2 -47.9

S O U R C E : On the Accuracy of Econorriic Observations, 2nd ed., p. 225, by Oskar M o r g e n s t e r n ; c o p y r i g h t © 1950, 1963 by Princeton University Press; reprinted by permission of Princeton University Press.

326

The Obstacles to Social-Science

Knowledge

army troops." Perhaps these statements are all reconcilable, but an apparent discrepancy of this magnitude on a question of such importance raises a twinge of doubt. (Schlesinger, 1966, p. 114) Sometimes one can check rehability b y comparing sets of different data that must bear a close resemblance to each other. For example, a bell w o u l d go off i n the m i n d of the p r o d u c t i o n manager of American Motors i f he were presented w i t h very different data on the numbers of auto bodies and auto engines produced i n his factories; there are very few cars w i t h more or less than one b o d y and engine, and therefore the t w o numbers must bear a close resemblance to each other. Check the internal reliability of your data. Make sure that the parts add to the totals and that there are no inexplicable large variations from period to p e r i o d or group to group. Experienced researchers frequently can detect mistakes i n their o w n and other people's raw data just b y scanning them and searching out peculiar variations. I n "The Case of the Indians and the Teen-Age W i d o w s , " A. Coale and F. Stephan describe h o w they noticed a few peculiar data i n the 1950 Census of Population and h o w they tracked d o w n the cause of the anomahes: Our first clue was the discovery in the 1950 Census of Population of the United States of startling figures about the marital status of teen-agers. There we found a surprising number of widowed fourteen-year-old boys and, equally surpris­ ing, a decrease i n the number of widowed teenage males at older ages. The numbers listed by the Census . . . were 1,670 at age 14; 1,475 at age 15; 1,175 at 16; 810 at 17; 905 at 18; and 630 at age 19. Not until age 22 does the listed number of widowers surpass those at 14. Male divorces also decrease in number as age increases, from 1,320 at age 14 to 575 at age 17. Smaller num­ bers of young female widows and divorcees are listed—565 widows and 215 divorcees at age 14. These strange figures, even though they appear in a very minor part of the carefully prepared and widely useful data presented in the Population Census, aroused our curiosity and set us to searching for an ex­ planation. (Coale & Stephan, p. 338) The explanation: I n a t i n y fraction of the I B M cards the key-punch opera­ tors h a d moved the data one c o l u m n to the right, and some middle-aged males were thereby transformed into teen-agers. Further investigation t u r n e d u p related errors that were not so obvious u p o n first inspection. I f errors like the teen-aged widowers manage to creep into the U.S. Census—which is a remarkably w e l l - r u n data-gathering operation—errors are m u c h more likely i n any other data w i t h w h i c h y o u are likely to work. Checking the accuracy of the r a w data does not ehminate all the likely obstacles. You may also h i t snags because the available data has been sum­ marized i n ways that do not fit your purposes. For example, I w a n t e d to estimate h o w m u c h shorter is the life of the average smoker than the life of the average nonsmoker (four years b y m y r o u g h estimate) and h o w many minutes less of life, per cigarette smoked, the smoker lives. ( A b o u t seven

Obstacles to Obtaining Adequate Subject Matter

327

minutes. D o y o u have a hght?) These estimates were hampered by a lack of comparability i n existing data. One study categorized smokers as "less than 10 a day," "11 to 20," "21 to 39," and "40 or over," whereas another study reported data as "less than 5," "about M pack," "about 1 pack," "about IM packs," and other studies used such categories as "20 to 34," "35 plus," and "more than 1 pack." The appropriate tactic was the reverse of one used to overcome nonre­ sponse—the "linear assumption." T h a t is, I assumed that the numbers of smokers w h o smoke eleven, twelve, thirteen, and so on to t w e n t y cigarettes a day are roughly equal to one another, and then I split the data for "11 to 20" i n such a way as to make them comparable w i t h the other data. Or i n a case like this one I m i g h t have made a graph of the data and interpolated the missing points. This w o u l d provide only a crude approximation, b u t one has to do something. The only question is whether some other approxima­ t i o n w o u l d be better. Another obstacle is that the existing data may l u m p several other cate­ gories together w i t h the one category i n w h i c h you are interested. I f you study expenditures for l i q u o r advertising, y o u w i l l find that the figures reported for magazines l u m p together the expenditures for beer, wine, and liquor, whereas the data for newspapers cover the categories separately. Chapter 17 discusses this p r o b l e m i n more detail. I n this case, y o u can b r i n g other evidence to bear. I f y o u have data showing that the split be­ tween liquor and beer plus wine i n newspapers is t w e n t y to eighty, you m i g h t assume that the same p r o p o r t i o n a l split w o u l d h o l d for magazines. The example of l i q u o r advertising is a special case of the more general obstacle of r a w data not being defined i n a manner that is useful for your particular purpose. As an illustration, unemployment statistics are variously defined as the number of people w h o are listed on the unemployment rolls or as the difference between the total w o r k force and the number of people employed. T h e estimates from these or other methods can differ very greatly (see Morgenstern, Chap. 13, for detailed discussion). Another obstacle is that the data may be averages for groups, whereas you may need data for the individuals w i t h i n the groups. I n some cases, y o u can safely use the group averages to describe the individuals, b u t this ap­ proach can lead to w h a t is called the "ecological fallacy." F o r example, y o u m i g h t w a n t to study the relationship between income and literacy because, perhaps, y o u w a n t to k n o w h o w people's incomes w i l l be affected i f literacy is raised. I f you compared the literacy rates and per capita incomes for K u w a i t , Austria, and Israel, you m i g h t come to t h i n k that more literate people have lower incomes. The fallacy, of course, is that i n K u w a i t a very few people have all the money and most of the people are very poor. So, even t h o u g h K u w a i t is a r i c h country, most of its i n d i v i d u a l citizens are very poor. L o o k i n g only at these countries, h i g h rates of literacy go w i t h l o w per capita incomes. Within each country, however, literate people w i l l have higher incomes than illiterate people b y and large, and, i f one raises literacy.

328

The Obstacles to Social-Science

Knowledge

one can safely count on raising per capita income also. This is one of those cases i n w h i c h i t is an error to consider the relationship between characteris­ tics of countries to stand for the corresponding relationship between charac­ teristics of individuals. D a t a are therefore needed for individuals and not countries. ( I n other cases, however, i t may be that y o u are i n trouble be­ cause y o u have data on individuals and you w a n t to study the behavior of groups. The whole is not always a simple sum of the parts.) Mechanical and computational errors sometimes introduce significant error into the data that are available to you. Computers are free of many of the errors of h u m a n computation, b u t they suffer from "rounding" error—that is, the computer performs all mathematics i n terms of arithmetical opera­ tions, and at each stage i t must r o u n d off. T h e computer carries the opera­ tions to many digits, b u t nevertheless these errors pile u p as the computer does the millions of computations necessary for even simple problems, and y o u often find that the same calculations done by different computer methods y i e l d somewhat different results.

5. Shortage of Subject M a t t e r Sometimes there is insufficient subject matter available to support solid con­ clusions. This obstacle is less likely to appear i n experimental w o r k because data can usually be generated i n u n l i m i t e d quantity, subject only to the restriction of cost and available funds ( o n this p o i n t see Chapter 2 4 ) . B u t sometimes the subject matter for experimentation is l i m i t e d too. For ex­ ample, i n the early stages of the nuclear physics of various particles, the amounts of matter available for experimentation were i n short supply, as dramatized i n the story of the Curies and their r a d i u m shortage. You are most likely to face a shortage of subject matter w h e n experimen­ tation is not possible. Wars may be mercifully few from the p o i n t of view of humanity, b u t the historian must sometimes catch herself w i s h i n g there had been more wars of certain kinds so that she w o u l d have more material to w o r k w i t h . Similarly, business cycles are always few i n number compared to the economist's need for data. Lack of data sometimes comes as a surprise. For example, i t is a shock to fledgling demographers to learn that the U n i t e d States was for a long time among the most b a c k w a r d of the industrial nations i n recording births; collection of b i r t h statistics d i d not even begin u n t i l 1915; i t was not u n t i l 1933 that b i r t h data were available for the entire country ( T . Smith, p p . 284-285). A n d there were no national-income statistics ( f o r example. Gross National Product) for the U n i t e d States covering the years prior to the t h i r d decade of the t w e n t i e t h century u n t i l S. Kuznets came along and p a r t l y remedied the deficiency. There are many possible reasons for the nonexistence or the shortage of data. I n a n t i q u i t y figures on Greek and Roman commerce and p o p u l a t i o n were seldom collected. But, even w h e n they were collected and recorded, the records that have come d o w n to us often lack the totals because the

Obstacles to Obtaining Adequate Subject Matter

329

totals were at the comers or bottoms of the stone tablets—and those are the parts of the tablets most likely to be broken off over time (Jones, p. 2 ) . W h e n y o u are faced w i t h a shortage of data, first look for more data. B u t look i n places that are not obvious, as w e l l as i n places that are. I f y o u have too few business cycles to w o r k w i t h i n U n i t e d States history, y o u m i g h t t r y examining business cycles i n those other countries whose economies bear at least some similarity to that of the U n i t e d States. I f you w a n t to study h o w a b a n on advertising affects liquor consumption and i f only a few states i n the U n i t e d States have bans, h u n t u p cases of other countries that have banned l i q u o r advertising. Historians often exhibit genius i n finding data i n out-of-the-way places. Sometimes physical instruments can fill gaps i n the historical record. Anthropology furnishes an example: Now in the field of the nonliterate cultures and cultural items, where dates are totally lacking, and where archaeology can hope in general to give us only rela­ tive time sequences, it is also interesting to know whether an institution goes back a thousand or ten thousand years, or is older or younger than another institution. But how shall we learn? Now and then some other science comes to our rescue: geology and palaeontology in remote prehistory, botany or tree-ring dendrochronology i n a few particular situations. Such outside aids are, however, rare and unusual. I n general, cultural anthropologv has to help itself. (Kroeber, p. 541) Since Kroeber wrote the above paragraph, however, atomic physics has come to the rescue. The carbon-14 method is often an enormous help i n establishing dates. I n many cases, i t is a superior substitute for examination of tree-ring patterns. Another way to squeeze out more information is to divide u p the available units of subject matter into smaller units. Economists sometimes use data for three-month rather than full-year periods, to increase the number of available observations, even though the quarterly data are m u c h less ac­ curate than the yearly data. Another example comes from a study of simu­ lated jury deliberations. A n enormous number of expensive jury experiments w o u l d have been required to obtain a sample of decisions large enough for statistical needs. The researcher therefore also collected data about the indi­ vidual judgments of jurors prior to the deliberations, w h i c h yielded a sample twelve times as large as the number of jury decisions ( R . Simon, 1967). T h o u g h splitting u p the data into smaller parcels may increase its infor­ mation value i n one respect, you w i l l probably have to pay a price i n diminishing its information value i n some other respect—usually i n the form of increased error i n the smaller units. I n the j u r y study, the researcher had to be careful to emphasize that i n d i v i d u a l jurors' judgments are not com­ parable to decisions made b y the jury as a whole. W h e n all else fails, the skills of the statistician can often squeeze extra meaning out of small samples. Anyone can come to a sound conclusion w i t h a sample of milhons. But, as we shall see i n Chapters 25-30, i t takes skill

330

The Obstacles to Social-Science

Knowledge

and ingenuity to infer sound conclusions from samples of five, ten, or one h u n d r e d observations. 6. H i g h I n d i v i d u a l V a r i a b i l i t y i n the D a t a Your i n t u i t i o n w i l l tell y o u that i t is easier to compare the m o n t h l y salaries of t w o jobs w h e n the salaries w i t h i n each of the t w o vocations are similar. I f three people w o r k i n g at job A earn $500, $550, and $600, whereas three people w o r k i n g at job B earn $411, $580, and $640, i t is diflBcult to tell reliably whether there is any difference i n the average salaries of the t w o jobs because there is so m u c h variability. B u t i f all three people on job A earn $540 and all three people on job B earn $570, i t seems m u c h clearer that there is a difference between the salaries of the t w o jobs. H i g h v a r i a b i l i t y among the subjects is a frequent difficulty i n experi­ mental as w e l l as i n survey research. Consider the study of whether m a k i n g cats "neurotic" w i l l increase their intake of alcohol. J . Masserman's design was as follows: First, determine whether "normal" cats choose m i l k w i t h 5 per cent alcohol content over m i l k containing no alcohol; second, find out whether g i v i n g one group of cats a neurosis increases the likelihood that i t w i l l choose m i l k w i t h alcohol i n preference to p l a i n m i l k . T w o of sixteen cats chose m i l k w i t h alcohol to start w i t h . I f ten of sixteen cats that have been given neuroses come to choose alcohol and only t w o of sixteen that were not given neuroses chose alcohol w h e n retested (as was the case), i t is easy to interpret the results. B u t i f four or five of the normal cats had chosen the alcoholic m i l k to start w i t h and i f fewer than ten of the neurotic cats had chosen the alcohol, i t w o u l d have been m u c h more difficult to determine whether i n d u c i n g the neuroses made any difference. H i g h variability w i t h i n the subject matter is one of the most important characteristics that sets the social sciences off from the physical sciences. Prenuclear physics and chemistry can be said to be almost determinate, because w h a t happens once w i l l happen again. ( B u t do not forget that many facts about h u m a n nature are almost as sure as those i n the natural sciences. I can predict w i t h practically perfect accuracy that, i f I fire a blank revolver i n front of your nose, your eyes w i l l b l i n k i n a "startle reflex." A n d luckily the chances are fantastically good that your next-door neighbor w i l l not t u r n out to be a cannibal or a pathological k i l l e r . ) Several tactics other than those suggested i n Figure 21.2 can help y o u to make satisfactory comparisons between groups, despite a h i g h level of vari­ ability w i t h i n the groups. T w o of these tactics are described here.

a. INCREASE T H E S A M P L E SIZE

Television p r o g r a m i n g can be considered ( a n d is) an experiment to see w h i c h programs are more popular. E v e n a slight difference i n p o p u l a r i t y could be detected i f y o u measured the t u n i n g of every one of the millions of

obstacles to Obtaining Adequate Subject Matter FIGURE

331

21.2

"Sorry sir, Fm not going to open up seven locks and disconnect the burglar alarm just to tell you what shampoos I favor.** Source: G R I N A N D B E A R Newspaper Syndicate.

IT

by George Lichty. © Field Enterprises, Inc., 1964. Courtesy of Field

network listeners for each of the programs. Such a large sample is not feasible, and a Nielsen sample of 1,200 d w e l h n g units ( M a d o w , et ah, p . 127) is generally large enough to measure most differences w i t h acceptable accuracy. I f t w o programs were very close, t a k i n g a sample many times as large w o u l d help. B u t remember that i f a terrifically large sample is re­ quired, the difference probably is not sufificiently i m p o r t a n t to warrant at­ tention.

332 b.

The Obstacles to Social-Science

Knowledge

C O M P A R E E A C H SUBJECT W I T H H I M S E L F

This tactic can be very p o w e r f u l i n surmounting the obstacle of h i g h i n d i v i d u a l variability. Assume that y o u are a campaign manager and that y o u w a n t to find out the effect of having your candidate give personal talks i n small towns. You w o u l d face the difficulty of h i g h variabihty w i t h i n towns i f you t r i e d to compare towns the candidate had visited w i t h towns he h a d not visited. B u t i f y o u measured preferences for your candidate before and after his visits to some towns, the variability w i t h i n the towns w o u l d not be so great a difficulty. You w o u l d need only to compare each t o w n against itself and then to average the changes. Each t o w n is said to be its o w n control i n this type of design.^ V a r i a b i l i t y w i t h i n groups often impels critics to claim that i t is impossible to make any v a l i d predictions for an individual i n one group compared to an i n d i v i d u a l i n another group. I f a p o l l predicts that 80 per cent of one group w i l l vote for candidate A , the critic asks. H o w can y o u predict that any given i n d i v i d u a l i n that group w i l l vote for candidate A? I t is certainly true that we can never make a prediction for an i n d i v i d u a l w i t h perfect accuracy. Any prediction is only a probability, and, as a simple pragmatic matter, k n o w i n g the probabilities for the group as a whole helps us make a p r e d i c t i o n for any i n d i v i d u a l w i t h i n the group. The odds are four to one that a randomly chosen individual w i l l vote for candidate A i f the group she is i n is 80 per cent for A . First-place teams i n the major leagues sometimes lose games to last-place teams. B u t even the most n a ï v e bettor w i l l not take even odds that the last-place team w i l l w i n a particular game. (The picture is different, of course, i f you k n o w something special about the particular vote or the particular teams—who the pitchers are, i n the latter case. ) The more that we k n o w about the various groups w i t h w h i c h we can identify an i n d i v i d u a l , the better w i l l be our prediction for the i n d i v i d u a l . I f we had k n o w n before the Kennedy-Nixon election that a particular voter was Catholic, w e could have said that the odds were 60-40 that she w o u l d vote Democratic. I f we had also k n o w n that she l i v e d i n N e w York City, our odds m i g h t have been 80-20, a stronger prediction. (Sometimes, however, the interaction of t w o such traits can drive the combined effect i n the other direction. ) I n general, the more finely we can subclassify an i n d i v i d u a l , the better w i l l be our prediction for her. The nature of predictions and h o w to make them are explored at greater length i n Chapters 25, 26, and 32.

7. T o o M u c h Subject Matter A n experienced researcher s first response to the obstacle of too m u c h subject matter is 'Tt should only happen to me." Subject matter is like dia3. The statistically trained reader will recognize that the analysis of variance is the appropriate general model for determining whether the "within" variance is satisfactorily small compared to the "between" variance.

Obstacles to Obtaining Adequate Subject Matter

333

monds: a scarce commodity of w h i c h you cannot beheve that you could ever have too m u c h . B u t i t is possible to be b u r i e d i n data, just as a person m i g h t be b u r i e d i n a landslide of diamonds. The data i n the U.S. Census, for example, overwhelmed researchers i n the late nineteenth century. I t took seven years and hundreds of clerks to b r i n g order to the 1880 data and to develop summarizing statistics. I t was the difficulty of coping w i t h that great m o u n t a i n of census data that led H . H o l l e r i t h to invent a system of punched cards and machines that w o u l d read and sort the data electrically. Machines very similar to those original 1890 machines are perhaps the commonest automatic data-processing ma­ chines i n use today, and they are the granddaddies of the more sophisti­ cated machines that are also i n existence now. Hollerith's firm grew into I B M — a l l because of the problem of too m u c h data ( H a l a c y , 1969, pp. 2, 41). Ironically, the very machines that were invented to sort huge piles of data can themselves create new obstacles of masses of data. Eveiyone w h o has w o r k e d w i t h high-speed computers has suffered fright and frustration w h e n she received literally pounds of paper and hundreds of thousands of numbers as the output of a computer run, even though all she really w a n t e d was a single yes-or-no answer. Then she is again faced w i t h the necessity of b o i l i n g d o w n and summarizing the output into some useful and manageable form. Automatic data processing is one w a y to cope w i t h too m u c h data. A n ­ other tactic is to take a sample of the large masses of data. The sample should be large enough to be sufficiently accurate yet small enough to be w o r k e d on easily (Chapters 9 and 31 discuss sampling and sample sizes i n more d e t a i l ) . I f automatic data processing were not available, i t w o u l d be necessary to take small samples from the mass of census data collected and make estimates for the country as a whole based on those samples. T h e n w h y not collect census data for only a sample of the population? The answer is that data on the entire population can provide knowledge about small subdivisions of the population, as w e l l as about the population as a whole. A small sample of the entire p o p u l a t i o n w o u l d tell us w i t h considerable accuracy h o w many people i n the U n i t e d States as a whole earn less than $5,000 per year. But a small sample of the U n i t e d States population could not reveal accurately h o w many people i n Elko County, Nevada, earn $5,000.^ For accounting purposes, most business firms collect complete financial data on their operations i n preference to taking samples of the data. Ac­ countants generally insist on reckoning to the last penny rather than m a k i n g estimates based on samples, even though the Internal Revenue Service w i l l settle for accounts rounded off to the nearest dollar. 4. In fact the U.S. Bureau of the Census does conduct some of its business with samples. The monthly Current Population Survey quizzes 35,000 people. And only two or three questions in the Decennial Census are asked of everyone; the rest are sampled.

334

The Obstacles to Social-Science

Knowledge

8. I n v i s i b i l i t y or Inaccessibility of Subject Matter There are times w h e n y o u w a n t to study a phenomenon that y o u cannot see or that y o u cannot reach w i t h your measuring instruments as directly as y o u w o u l d like. For example, the U n i t e d States wants to k n o w h o w successful China's harvest is, b u t China does not supply the data, and the U n i t e d States cannot make an on-the-spot survey. O r a firm wants to k n o w h o w m u c h a competitor is spending for advertising. Or y o u w a n t to k n o w exactly w h a t is said inside jury boxes, b u t the l a w w i l l not p e r m i t y o u to record the deliberations. T h e n y o u must w o r k w i t h a less satisfactory proxy. 9 . Interference w i t h Subject Matter by R e s e a r c h e r W h e n y o u observe or measure a phenomenon, y o u inevitably affect i t to some extent. A t one extreme is the example of von Osten's horse (pages 288-289), where the phenomenon that is observed is entirely caused by the observer. A t the other extreme are studies of economic and social history, w h i c h employ records of the past; the events of the past are already complete and are not influenced b y the observer except i n the sense that the records of the past may be altered by the historian's activity. M u c h of social research falls between these extremes—the influence that asking a question may have u p o n a person's t h i n k i n g ; the influence that w a t c h i n g a child's play may have u p o n the playing; and the influence of governmental study of an industry's practices upon the practices. Almost always you w i l l wish to m i n i m i z e the influence of the researcher u p o n the subject matter. Therefore you should seek methods of observing and measuring that w i l l have as l i t t l e effect as possible. The less the subjects notice the process of observing and measuring, the less they are likely to be influenced b y i t . Therefore i t makes sense to make your observations as "unobtrusive" as possible, to use the n o w - w e l l - k n o w n phrase of W e b b et al. T h e i r book compiles and discusses many such unobtrusive devices; i n this book such devices are mentioned throughout, as the theme arises re­ peatedly i n various contexts. 10. S u m m a r y Data, good data, are the researcher's life blood. B u t y o u may find i t difficult to obtain any data. Or you may be unable to experiment w i t h your subjects for ethical or practical reasons. Or the data may be unreliable. O r the sample of data may be biased, rather than being a fair image of the universe that y o u w i s h to describe. Or you may even have so m u c h subject matter that y o u are overwhelmed b y i t . Or . . . or . . . or. . . . For each obstacle to getting good data there are solutions, partial solutions, or substitutes, many of them mentioned i n this chapter. B u t the ultimate resource is your ingenuity and your creative faculty, b u i l d i n g on your experience. Good luck.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 3/22/2020 1:23 PM via ST MARYS UNIV main.ehost

335

Obstacles to Obtaining Adequate Subject Matter EXERCISES 1. G i v e e x a m p l e s f r o m r e s e a r c h in y o u r f i e l d t o i l l u s t r a t e t h e f o l l o w i n g stacles to obtaining k n o w l e d g e :

ob­

a. b i a s in t h e s a m p l e b. n o n r e s p o n s e o r u n a v a i l a b i l i t y of s o m e p a r t of t h e u n i v e r s e c. i n a b i l i t y t o e x p e r i m e n t d . f l a w s in t h e s u b j e c t m a t t e r e. s h o r t a g e of s u b j e c t m a t t e r f. t o o m u c h s u b j e c t m a t t e r g . i n v i s i b i l i t y o r i n a c c e s s i b i l i t y of s u b j e c t m a t t e r 2. S u g g e s t m e t h o d s f o r s u r m o u n t i n g t h e o b s t a c l e s y o u d e s c r i b e d in cise 1. 3. F i n d t w o s e t s of c o m p a r a b l e d a t a f r o m d i f f e r e n t o r i g i n s a n d t h e m . H o w r e l i a b l e a r e t h e y , as i n d i c a t e d by t h e c r o s s c h e c k ?

Exer­

cross-check

4 . S h o w a n i n t e r n a l - r e l i a b i l i t y c h e c k of s o m e d a t a .

ADDITIONAL

READING

FOR

CHAPTER

21

Statistical records, personal documents, and other publicly available data are d i s c u s s e d in S e l l t i z et al. ( C h a p t e r 11). For a d i s c u s s i o n of t h e u s e of " s e c o n d a r y d a t a " — t h a t is, d a t a t h a t a r e p u b l i c l y a v a i l a b l e — i n t h e c o n t e x t of b u s i n e s s r e s e a r c h , s e e B o y d et al. ( C h a p t e r 5), o r S c h o n e r a n d U h l , p p . 1 8 0 - 1 9 2 . A d e s c r i p t i o n of v a r i o u s r e p o s i t o r i e s f o r p u b l i c l y a v a i l a b l e s o c i a l - s c i e n c e d a t a is g i v e n by N a s a t i r . W e b b et al. g i v e a r e m a r k a b l e n u m b e r of e x a m p l e s of i n g e n i o u s w a y s t o o b ­ t a i n s u b j e c t m a t t e r w h e n t h e t a s k s e e m s e s p e c i a l l y d i f f i c u l t . T h i s is a b o o k t o j o g y o u r i m a g i n a t i o n w h e n y o u a r e s t y m i e d f o r l a c k of i d e a s o n h o w t o get information to answer a research question. A g o o d s u m m a r y of t e c h n i q u e s t o o v e r c o m e n o n r e s p o n s e in i n t e r v i e w i n g m a y b e f o u n d in S t e p h a n a n d M c C a r t h y ( C h a p t e r 11). T h e r e is a u s e f u l s h o r t b i b l i o g r a p h y at t h e e n d of t h a t c h a p t e r . M o o r e is a n e x c e l l e n t s h o r t t r e a t m e n t of e r r o r s in e c o n o m i c d a t a . A u s e f u l g e n e r a ! d i s c u s s i o n of l a b o r - f o r c e s t a t i s t i c s is J a f f e a n d S t e w a r t . S e i v i n (1965) c o g e n t l y d i s c u s s e s t h e e c o l o g i c a l f a l l a c y .

2i2 o b s t a c l e s t a t h e study of changes over time 1. Summary

Change over time can be an obstacle that can becloud the understanding of relationships i n w h i c h your variables are not time-related. B u t i n some cases change over time is exactly w h a t y o u are interested i n . This chapter dis­ cusses the obstacles to understanding w h a t does happen over time. Some non-cause-and-effect research aims to describe a phenomenon as i t exists at a single point in time. For example, the U.S. Census tries to mea­ sure the population of the nation in a given year. Television ratings t r y to measure the sizes of television audiences at each moment i n the television day. A. Kinsey t r i e d to describe and compare sexual behavior of males and females i n a particular contemporary period. B u t many—perhaps most— non-cause-and-effect research studies are concerned w i t h change over time. Most history is a study of change. M u c h of anthropology is a study of the changes i n cultures. The descriptive psychological studies b y A. Gesell and others of h o w children behave at various ages belong i n this category, as does the w o r k of L . T e r m an, w h o collected data on a group of gifted children year b y year over a period of almost t h i r t y years. T i m e itself works no changes; time is only a proxy for other processes that operate over time. I f a researcher studies the effect of an independent variable on a dependent variable that reveals its change only slowly, i t is m u c h the same as i f time is used as a proxy. For example, a prolonged period of smoking is apparently necessary to affect health, and the effects

Obstacles to the Study of Changes Over Time

337

may become apparent late i n the smoker's life—even i f she quits smoking. Or, even less pleasantly, syphilis contracted w h e n y o u n g can cause a person to go m a d many years later. I n these examples of long time lag—or long "latency period"—between cause and effect, i t is not time itself that pro­ duces the delayed effects b u t rather various series of biological changes that take place unobserved and i n interaction w i t h subsequent happenings. T h e difference i n the short-run and long-run purchase responses to price changes offers another example i n w h i c h a time variable is a proxy for other changes. I f the power company raises the price of electricity, y o u may do n o t h i n g immediately, b u t next time y o u b u y a stove y o u may shift to gas. Sometimes the l o n g r u n actually reverses the short-run direction. For example, w h e n a baby drinks from her mother's breast, she diminishes the m i l k supply for a while, b u t this activity stimulates an increase i n production that results, several hours later, i n a larger supply than i f the baby had not suckled. These examples emphasize that i t is crucial to choose the appropri­ ate time p e r i o d i n w h i c h to look at the effect of the independent variable. I n some situations the independent variable takes effect gradually and cumulatively. A cigarette advertisement has some immediate effect i n ciga­ rette sales, b u t this year's advertising also has some effect next year, five years hence, and even t w e n t y years hence, because an advertisement may create a brand loyalty that persists for many years. I n the p r e - W o r l d W a r I I cigarette market, fully 80 to 85 per cent of the effect of a given year's advertising took place i n the years after the year i n w h i c h the advertising was r u n ( T e l s e r ) . The researcher must then find some way of summarizing the effect of a given year's advertising i n all the f o l l o w i n g years together— not an easy task. Sometimes the study of changes over time is really a comparison of vari­ ous stages. For example, i f a person wants to k n o w h o w his w e i g h t is changing—perhaps as a result of a diet, or perhaps only as a result of aging—he wants to compare his w e i g h t this year w i t h his w e i g h t last year. But such comparisons are bedeviled by time-related breaches of ceteris paribus. The researcher must separate the change caused b y the internal processes of the material, w h i c h is w h a t he is interested i n , from the envi­ ronmental change effects. Therefore, he must make sure that he works w i t h appropriately chosen time periods. For example, the w e i g h t watcher must w e i g h himself consistently each day. I t w i l l not matter whether he weighs himself at the height or at the depth of the daily w e i g h t cycle, whether he weighs himself every m o r n i n g or every night, as long as he is consistent. But, i f he sometimes weighs himself i n the m o r n i n g and sometimes at night, the picture w i l l be confused. Similarly, w h e n a businessman wants to com­ pare his sales from year to year to see whether the business is growing, he can safely compare the same m o n t h for t w o or more years, even though that month is an unusually h i g h or an unusually l o w m o n t h i n the yearly cycle. The most formidable and frequent obstacle i n research over time is that the change i n the dependent variable takes place so slowly, over so long a

338

The Obstacles to Social-Science

Knowledge

period of time, that i t is not feasible to observe the variable over the full period of change. Research on the development of gifted children, on smoking and health, on syphilis, on cigarette advertising, and on cultural change all exemphfy this obstacle. N o w we must consider some devices to overcome i t . One w a y is not to overcome the obstacle b u t rather to wait, as T e r m a n actually d i d for almost t h i r t y years. Terman's method of observing the chil­ dren at intervals over a long period of t i m e is an example of the "panel" method. The essence of the panel is that i t allows the researcher to observe changes in the same individual over time, w h i c h vastly reduces problems of i n d i v i d u a l variability. The panel method is used frequently i n v o t i n g studies and i n market research, because i t provides an unexcelled o p p o r t u n i t y to observe s w i t c h i n g and shifting behavior among individuals, behavior that is m u c h harder to observe i n a comparison of one group w i t h another. The panel method is described i n more detail i n Chapter 12. I n some cases—especially i n economics—the researcher need not w a i t for new data to accumulate. M u c h economic data on previous years are already available, and economists can use them to examine long-term changes of the economy; this approach is called "time-series analysis." A l t h o u g h there are many sophisticated statistical devices to aid this k i n d of analysis, the essence of the time-series method is simply the examination of data from many successive periods, i n order to find and explain the changes that occurred. Sometimes i t is not obvious whether a time-series study is being done because the researcher is interested i n finding out about changes over t i m e or because she is using historical data to find out other relationships. For example, an economist or sociologist m i g h t make a graph of the relationship between yearly b i r t h rates and yearly per capita income. She m i g h t also notice that the b i r t h rate is falling, that is, that per capita income is the same for t w o years b u t the b i r t h rate is lower i n the second year. The researcher m i g h t take account of the change i n the b i r t h rate over time because she is interested i n the change per se. Or she m i g h t be interested only i n the relationship between b i r t h rate and per capita income, i n w h i c h case she takes account of the time t r e n d to clarify the relationship between b i r t h rate and income.^ One w a y to find out n o w w h a t happened long ago is to ask people. F o r example, I w a n t e d to find out h o w people's happiness changed w h e n their family incomes changed, w i t h o u t w a i t i n g many years to collect the informa­ tion. Therefore I employed questions of this sort: Given your present knowledge of yourself and of other people, what proportion of Americans of your age and sex would you say are (were) happier than you are (were)? 1. To put it technically, in the former case the coefficient of the time trend is a parameter of interest, and the researcher is studying economic or social history or development. In the latter case the coefficient of time is simply a constant for which one can make allow­ ances in a static demographic analysis.

Obstacles to the Study of Changes Over Time A. B. C. D. E.

339

Less than 10% happier than you 10-25% 25-50% 50-75% 75-90%

F. More than 90% happier than you kt present When you were in early grammar school, grades 1 to 4 When you were in grammar school, grades 5 to 8 When you were in high school When you were age 17 to 22 (or in college) (Skip if under 29) When you were age 23 to 29 (Skip i f under 39) When you were age 30 to 39 (Skip if under 49) When you were age 40 to 49 (Skip if under 59) When you were age 50 to 59 (Skip if under 69) When you were age 60 to 69 Sometimes one can surmount the obstacle of slow change b y artificially speeding u p time. The geneticist cannot w a i t around for the many years between generations of h u m a n beings; instead, he studies the genetics of f r u i t flies or other organisms that reproduce generations very quickly. I n a year's time he can trace a genetic change t h r o u g h perhaps t h i r t y successive generations of fruit flies. T i m e may also be speeded u p artificially w i t h a combination of analytic and experimental devices. For example, earlier we noted that there are too few business cycles i n recent U n i t e d States history to provide economists w i t h sufficient data for study. Another w a y to handle this situation is to make various assumptions about h o w typical individuals and institutions of various types i n the economy behave—banks, families, steel companies, Fed­ eral Reserve Bank, and so forth—and then to "simulate" their interaction on a computer. Instead of t a k i n g three years, a business cycle can be r u n through the computer i n minutes. T h e n the economist can examine what k i n d of business cycle results from a given set of assumptions. Chapter 14 explores simulation i n more detail (also see G. Orcutt, et al.). For studies of c h i l d development and aging processes, the cohort method—which is called the "wide view" i n Chapter 12—is a possible sub­ stitute for the panel or historical "long view." For example, there is no conceptual difference between comparing the activity rates of each of a group of monkeys w h e n they are one m o n t h o l d w i t h their o w n activity rates w h e n they are t w o months o l d and comparing a one-month-old group of monkeys w i t h a t w o - m o n t h - o l d group on the same day. There are practical and statistical differences between the t w o approaches, however; they are discussed i n Chapter 12. Accounting for effects that are spread out over time can sometimes be done effectively w i t h statistical analysis of past data. B u t i t is vastly prefer­ able, where i t is possible, to investigate such effects w i t h experiments that

340

The Obstacles to Social-Science

Knowledge

permit observations i n several successive periods. Such designs are dis­ cussed i n Chapter 12. 1. S u m m a r y Changes that occur slowly i n people are among the most i m p o r t a n t phe­ nomena. A n d often these long-run changes are different from short-run changes and cannot be inferred from them. Sometimes one can find historical data about individuals or groups or economies that can be compared at various dates and w i t h the present. These data may be found i n censuses or archives. Frequently people can recollect h o w things were at various times i n the past. Sometimes one may compare people or groups that are now at different ages, on the assumption that different individuals or groups of different ages are analogous to the same individuals or groups observed at different periods i n their lives. A t times no such devices are satisfactory. One may then resort to a longterm panel study, and w a i t ( a n d w a i t and w a i t ) for the results.

EXERCISE 1. F i n d f i v e r e s e a r c h s t u d i e s of c h a n g e s o v e r t i m e in y o u r m a j o r d i s c i p l i n e , a n d s h o w h o w t h e obstacles w e r e (or w e r e not) s u c c e s s f u l l y o v e r c o m e .

Z3 o b s t a c l e s t o t h e s e a r c h for c a u s a l relationships 1. 2. 3. 4. 5.

Causation by a "Hidden Third Factor" Muitivariable Causation Confounding the Dependent and independent Variables Feedback; Interaction Between Dependent and Independent Variables Summary

The ideal design of a study that seeks to establish cause and effect was set f o r t h i n Chapter 2. B u t sometimes the distinction between studies that do and do not establish causal relationships is subtle, confusing, and ambigu­ ous. Chapter 32 tries to sort out the philosophical tangle of such borderline cases. This chapter discusses the empirical obstacles i n the w a y of determin­ ing whether a relationship is causal and focuses on h o w to design the study to avoid t h e m successfully. This chapter and Chapters 25 and 32 all deal w i t h the search for causes and effects. Here is the d i s t r i b u t i o n of labor among the chapters: This chapter discusses the practical decisions that you must make at the p l a n n i n g stage of your study to ensure that i t w i l l p e r m i t y o u to draw causal conclu­ sions. Chapter 25 discusses h o w to handle your data after they have been collected, i n order to extract the most knowledge about the cause-and-effect relationships. Chapter 32 considers some of the more subtle problems that arise w h e n you must decide whether to refer to an observed association as a cause-and-effect relationship. One general piece of advice before we tackle the specific obstacles: Get to k n o w all about the circumstances surrounding the causal relationship you seek to establish. Saturate yourself i n the relevant facts. I t is only i n that way that y o u can protect yourself against the kinds of mistakes about causal relationships that y o u n g children make, for instance t h i n k i n g that the t u r n ­ ing on of streetlights makes the sun go d o w n .

342

The Obstacles to Social-Science

Knowledge

Most of the obstacles to establishing causal relationships afflict nonexperimental research most sharply. Therefore, most of w h a t we k n o w about establishing causal relationships comes from sociologists, economists, and historians, rather than from experimental psychologists. But, even i f y o u can experiment w i t h your subject matter, your problem is not automatically solved, as w e shall see. 1. C a u s a t i o n b y a " H i d d e n T h i r d F a c t o r " This section takes up the possibility of coming to believe mistakenly that one variable is causally influential w h e n another variable is "really" the cause—the "hidden t h i r d factor" error, also called "spurious correlation" i n sociology and social psychology. Sometimes this obstacle is i n c l u d e d under the t e r m "confounding" i n experimental psychology ( U n d e r w o o d , pp. 8 9 92); and, along w i t h the obstacle of m u i t i v a r i a b l e causation, i t is called "specification error" i n economics. The classic example of the hidden-third-factor error i n an experimental context is the case (see Chapter 4 ) of the gentlemen who, after getting d r u n k successively on brandy and soda, whiskey and soda, and bourbon and soda, concluded that the soda caused their inebriation. Notice that their deduction was perfectly sound, given only that m u c h evidence. Other knowledge of the w o r l d is necessary to set them straight. The situation is diagramed i n Figure 23.1. The objective is to determine the cause of t/, the dependent variable—inebriation, i n this example—when a relationship ( b r o k e n l i n e ) w i t h an independent variable, Xi ( d r i n k i n g soda), has been observed. The "real" cause is shown by the solid arrow from w ( l i q u o r ) to y. Another example is M a r k Twain's cat w h o sat on a hot stove l i d : "She w i l l never again sit d o w n on a hot stove l i d ; b u t also she w i l l never sit d o w n on a cold stove any more." Twain's recommendation has merit: " W e w o u l d be careful to get out of an experience only the wisdom that is i n it—and stop there."^ Such errors have occurred many times i n the history of science, and sometimes the error has temporarily knocked a science off the track. I n one famous experiment, a scientist found that rats exhibited unexpected learning behavior, y, w h e n stimulated b y a blast of air, X i . The results turned psycho­ logical theory topsy-turvy. Later, i t was f o u n d that the rats' unexpected behavior was "really" caused b y a piercing sound, to, that accompanied the blast of air from the experimenter's apparatus, a sound that affected them the w a y a squeaky piece of chalk affects many humans. For another example, medical researchers for many years attempted to graft, y, SL vital part of one m a m m a l onto another m a m m a l of the same species, b u t the operations always failed, despite the delicacy w i t h w h i c h the surgeons performed the operations. The medical researchers then con1. Thanks to Chuck Linke for this quotation.

Obstacles to the Search for Causal Relationships FIGURE

23.1

343

The Hidden-Third Factor-Error S o d a X,

1::^

y

Inebriation

eluded tnat there were physiological and chemical reasons, X i , preventing the success of such organ transplantations. T h e n along came a Russian scientist w h o claimed to have succeeded i n sewing one dog's head onto the side of another dog—and b o t h heads barked for months. I t was because his surgical technique, to, was so good that infection d i d not occur and the operation succeeded. The previously assumed causes of the transplantation failure, antibodies, protein reactions, and so forth, were found not to be the real causes at all.Sometimes i t does not matter whether you k n o w w h i c h is the "real" cause. I f an ingredient i n cigarette smoke causes cancer b u t must necessarily al­ ways be part of cigarette smoke, i t does not matter m u c h whether we say that cigarette smoking, ingredient X , or still another ingredient i n cigarette smoke causes l u n g cancer. O n the other hand, i f there is any possibihty that the real cause m i g h t ever be separated from the other variables that accompany i t , then i t makes a difference whether we refer to ingredient X or something else as the real cause, for future research may thus be influenced. ( I n c i d e n t a l l v , finding that tars are the real cause does not weaken the causal status of cigarette smoking b u t rather strengthens i t . ) Chemists often can approach progressively closer to real causes. T h e y can now show that the tars contained i n cigarette smoke cause some kinds of cancers i n animals. Perhaps sometime soon they w i l l be able to show that cigarettes w i t h o u t these factors w i l l not cause cancer ( b u t perhaps may still cause m y r i a d other diseases). They w i l l then have come closer to isolating the "real" cause of cancer. Remember also how the Curies refined tons of ore and progressively identified many elements as irrelevant u n t i l they found that r a d i u m alone was " t r u l y " responsible for the radioactive photographic-plate effect that Becquerel had observed w i t h potassium-uranium salt. The chemist and physicist can refine a given sample u n t i l i t is almost perfecUy pure, u n t i l i t contains practically n o t h i n g b u t one compound or element. I t is the purity of the specimen that makes us feel that the real cause has been identified. Nevertheless, the chance always exists that i t is some i m p u r i t y i n the sample 2. My friend Stanley Friedman could find no reference in the scientific literature to this event, which was originally reported in the newspapers, so probably it is phony. But the example is still illustrative.

344

The Obstacles to Social-Science

Knowledge

or container—or something else i n the environment—that is the real cause of the phenomenon. The social scientist cannot refine his material w i t h such satisfying progres­ sive logic, b u t his best strategy is somewhat similar. H e should vary every possible condition except that w h i c h he regards as the cause, to see whether the variations change the values of the dependent variable. I n experimenta­ tion, one varies each of the controlled conditions systematically. I n nonexperimental research, various types of cross-classification are t r i e d to see whether any other variables seem to exhibit any causal influence (Zeisel, 1957, Chaps. 8, 9, especially p. 192). Sometimes, however, the h i d d e n factor is t i e d so t i g h t l y to the apparent cause that they cannot be separated. I n questionnaire research the possibil­ i t y always exists that i t is some special q u i r k i n the form of the question that forces a particular type of answer, rather than the intended content of the question. B u t a question must be asked i n some words, w h i c h means that some q u i r k is always possible. Asking the same question i n t w o different forms offers some check on this problem.

2. M u i t i v a r i a b l e C a u s a t i o n This section takes u p an obstacle that often afflicts research studies i n w h i c h you w a n t to establish the effect of one, t w o , three, or n ( n is the standard symbol for some large b u t unspecified n u m b e r ) particular independent variables on a dependent variable. The difficulty is that the various inde­ pendent variables do not change independently of one another, at least i n the w o r l d outside the laboratory. I f y o u w a n t to determine the effect of rainfall, Xi, on crop yield, ij, y o u m a y have to take account of the fact that, w h e n rainfall is heavy, temperature, Xo, tends to be lower than usual. There­ fore, y o u w i l l not be able to determine, w i t h o u t m a k i n g some special ar­ rangements, w h a t the simple relationship between rainfall and crop y i e l d is. I n other words, the difficulty is to create ceteris paribus conditions. Again i t is helpful to diagram the situation (see Figure 23.2). The object i n this case is not to find which variable explains the variations i n y, as i t was i n the previous section. Rather, the purpose is to determine whether or not, FIGURE

23.2

Muitivariable Causation

Obstacles to the Search for Causal Relationships

345

and how much, a particular variable Xi influences y. The obstacle is that, unless the researcher also takes account of X2 ( o r also X3, X4, and so f o r t h ) , he cannot determine the extent of the causal relationship between Xi and y, because there is a relationship ( t h o u g h noncausal) between Xi and Xo. Another example: Particular departments i n department stores spend more for advertising, X i , d u r i n g those months w h e n they have especially interesting merchandise. H o w , then, can you determine the effect of adver­ tising on sales, y? A change i n sales m i g h t be p a r t l y or w h o l l y caused b y the excellence of the merchandise assortment, X2, rather than b y the advertising, or i t m i g h t be caused by both. M a n y wonderfully w r o n g conclusions can be d r a w n from unwise han­ d l i n g of multivariate problems. For example, auto prices, x, and auto sales, y, b o t h rose from 1932 to 1956 (see Table 23.1) (Suits, 1958). I t is unwise, however, to conclude that higher prices cause higher auto sales or that h i g h sales of cars cause h i g h prices over a long period of time. W h e n one takes account of such other variables as disposable income, X2, stock of cars on the road, X3, credit terms, X4, and other variables, then the lower the price the more cars that are sold, a more sensible conclusion indeed. A l l the other independent variables are themselves related to price, w h i c h is w h y the simple relationship between price and sales is confusing. The muitivariable obstacle is not simply that many different factors affect the dependent variable. O f course i t is true that, beside a l i g h t e d match, a cigarette, a hand to h o l d the match, and air i n w h i c h to b u r n are required to cause a l i g h t e d cigarette. B u t the presence of air may generally be treated as a parameter ( t h a t is, a condition that does not change i n the course of the s t u d y ) ; i f i t does change, the change is not related to the striking of the match. B u t disposable income and the stock of cars on the road cannot sensibly be treated as parameters i n a study of the relationship between prices and auto sales. The nature of the multivariate obstacle, rather, is that the conditions of ceteris paribus do not hold. Other things are not equal w h e n the rainfall is h i g h or low, and therefore the differences i n crop y i e l d at h i g h and l o w rainfall may not be attributed simply to the differences i n rainfall. I f there were no relationship between rainfall and temperature, then i t would be safe simply to look at the relationship between rainfall and crop yield, for the effects of temperature w o u l d average out over the p e r i o d of the study, and the ceteris paribus assumption w o u l d hold. The crucial issue i n the multivariate p r o b l e m is that there is a relationship between the independent variables Xi and X2. Similarly, i f department-store product quality tends to change as the amount of advertising changes, then the conditions of ceteris paribus do not hold, and the changes i n sales may not be attributed simply to changes i n advertising. I f y o u can experiment w i t h the subject matter, y o u may be able to sur­ m o u n t the multivariate obstacle, because the nature of experiment ensures

346

The Obstacles to Social-Science

Knowledge

TABLE 23.1

Year

Real RetailPrice Index, Passenger Automobiles

Real Disposable Income {Billions of 1947-1949 Dollars)

Stock of Cars January 1 {Millions)

Retail Sales, New Automobiles {Millions)

1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1949 1950 1951 1952 1953 1954 1955 1956

126.5 128.5 128.5 120.5 117.0 121.0 133.8 131.0 134.3 144.9 186.6 186.6 181.5 195.7 188.2 190.2 196.6 193.4

83.4 82.6 90.9 99.3 111.6 115.6 109.0 118.5 127.0 147.9 184.9 200.5 203.7 209.2 218.7 221.6 236.3 247.2

18.7 17.9 18.9 19.4 20.1 21.5 22.3 22.7 23.2 24.5 30.6 33.1 35.7 37.6 39.3 41.6 43.0 47.0

1.10 1.53 1.93 2.87 3.51 3.51 1.96 2.72 3.46 3.76 4.87 6.37 5.09 4.19 5.78 5.47 7.20 5.90

SOURCE: ' T h e Demand for New A u t o m o b i l e s in the United States," p. 279, by Daniel B. Suits, in Ttie Review of Economics and Statistics, V o l . XL {August 1958).

the conditions of ceteris paribus. I f , instead of examining data on natural rainfall and crops, y o u artificially control the amount of water that each crop plot receives, y o u can then vary the amount of water among the experi­ mental plots i n random fashion. I f the amount of water varies randomly, you can then be sure that the temperature w i l l not be associated w i t h the amount of water that the plots receive. The temperature w i l l indeed vary also. But, as there w i l l be no systematic relationship between temperature and water, y o u can be sure that the high-water and low-water trials among the plots w i l l average the same temperature. I f they average the same tem­ perature, then the conditions of ceteris paribus hold—for temperature, any­ way. A n y changes i n crop yield may then be attributed solely to changes i n water. This example is an application of the crucial device of randomization i n experimentation. Sometimes y o u can arrange to conduct an experiment, even though i t does not seem possible to do so at first. A department store may be loath to vary its advertising expeditures randomly i n order to isolate the effect of advertising from that of merchandise quality. B u t y o u may be able to con­ vince the manager to vary the advertising level slightly from week to week, so that the experimental data may be obtained w i t h o u t causing an executive uproar or d i s t u r b i n g the store's basic promotional scheme very m u c h .

Obstacles to the Search for Causal Relationships TABLE 23.2

347

Accident Rates of IVIale and Female Automobile Drivers

Never had an accident while driving Had at least one accident while driving Total (Number of cases)

Men

Women

%

%

56 44 100 (7,080)

68 32 100 (6,950)

SOURCE: Say It With Figures, 5th ed., p. 120, by Hans H. Zeisel; c o p y r i g h t 1947, 1950, 1957, 1968 by Harper & Row, Publishers, I n c o r p o r a t e d ; reprinted by permission of the publishers.

I f experimentation is t r u l y not feasible, all may still not be lost. You may be able to isolate your independent variable b y h o l d i n g the other variable(s) constant statistically instead of experimentally. Cross-classification is one statistical device that accomplishes this purpose. A cross-classification allows you to look at the effect of different levels of your independent variable X i at the same level of the other independent variable X2, w h i c h makes i t possible for y o u to examine the effect of Xi on y under conditions that approach ceteris paribus. For example, perhaps y o u w a n t to learn the effect of sex u p o n incidence of auto accidents. The simple tabulation i n Table 23.2 apparently shows that w o m e n are safer drivers than men. B u t i t seems plausible that amount of d r i v i n g m i g h t affect the picture. A n d a cross-classification does clarify the issue. Table 23.3 shows that w h e n distance driven is crudely controlled b y d i ­ v i d i n g the sex groups into more than, and less than, 10,000 miles driven, there is no difference i n accident rates. (Furthermore, one can guess from this crude cross-classification that men are safer drivers per mile driven, as follows: The d i s t r i b u t i o n of the sexes b y amount driven indicates that men drive more miles than women. This implies that within b o t h the more-thanand the less-than-10,000-miles-driven groups, men drove more miles than women. I f nevertheless the accident rates w i t h i n distance groups are similar b y sex, m e n have less accidents per mile. A finer subclassification w o u l d check on this guess.) The regression is another statistical technique that helps to isolate the effect of the independent variable i n w h i c h you are interested. The regres­ sion is actually something more than a refined type of cross-classification; i t takes advantage of the continuous gradations i n variables, rather than l u m p ­ i n g them into a few categories, as does cross-classification. A regression uses the information from all the observations t o smooth out the whole set of observations, rather than each observation by itself. These and other ana­ lytic devices to accomplish the same purpose w i l l be discussed at greater length i n Chapter 25. A m i g h t y question may have sprung to your m i n d : H o w can one k n o w w h e n all the i m p o r t a n t variables have been taken care of? There is no fully

348

The Obstacles to Social-Science

Knowledge

TABLE 23.3 Automobile Accidents of Male and Female Drivers by Number of Miles Driven Female Drivers

Male Drivers Drove More Ttian 10,000 Miles % Had at least one accident while driving Never had an accident while driving Total (Number of cases)

52

48 100 (5,010)

Drove 10,000 Miles or Less

Drove More Than 10,000 Miles

%

%

25

75 100 (2,070)

52

48 100 (1,915)

Drove 10,000 Miles or Less %

25

75 100 (5,035)

SOURCE: Say It With Figures, 5th ed., p. 133, by Hans H. Zeisel, c o p y r i g h t 1947, 1950, 1957, 1968 by Harper & Row, Publishers, I n c o r p o r a t e d ; reprinted by permission of the publishers.

satisfactory answer. The core of the matter is thorough knowledge of the complexities of that aspect of the real w o r l d that y o u are investigating, as w e l l as extensive trials of the Hkeliest variables. This chapter explains w h a t to do about m u i t i v a r i a b l e causation i n plan­ n i n g your study; Chapter 25 takes u p w h a t to do after y o u have your data i n hand. I t is now, at the p l a n n i n g stage, that y o u should take advantage of your flexibility to choose sound variables. A t the design stage y o u should arrange to collect the data y o u w i l l need for the cross-classification or the regression. I f y o u are doing an experiment, y o u should at this stage ensure that a l l the relevant variables are being h e l d constant. I n the crop-yield study you w o u l d collect temperature, as w e l l as rainfall and crop-yield, data; i f y o u were experimenting, y o u w o u l d h o l d the temperature constant. I n the department-store study, you w o u l d obtain data on excellence of merchandise assortment, as w e l l as on advertising and sales. I n political polling, you w o u l d collect data on income, age, geography, and anything else that m i g h t seem relevant. You should also consider collecting data i n more than one way, so that your results w i l l bolster one another and increase the w e i g h t of total evi­ dence. A n i m a l experiments and h u m a n surveys on smoking support one another and strengthen the argument for the cause-and-effect relationship w i t h cancer because each m e t h o d holds constant a different set of the m u l t i ­ ple variables. 3. C o n f o u n d i n g the D e p e n d e n t and I n d e p e n d e n t V a r i a b l e s D o England's leaders come from Oxford and Cambridge because the i n ­ struction at those universities makes leaders of men? O r is i t simply that the people w h o w o u l d be leaders anyway are sent to Oxford and Cambridge?

Obstacles to the Search for Causal Relationships

349

This is a hard question to answer, p a r t l y because i t is difficult to measure the extent to w h i c h a university graduate is a "leader" or "succeeds." L e t us instead ask a similar question about a type of educational institution whose graduates' success we can more easily measure—business schools; i t seems fair to say that the salaries p a i d to businessmen are a reasonable proxy for success. W e ask, then, does H a r v a r d Business School make good business people? O r do people w h o w o u l d be successful i n business anyway go to H a r v a r d Business School? I t w i l l not suffice to compare the salaries earned b y graduates of H a r v a r d Business School w i t h the salaries of a group of business people who went elsewhere to business school or w h o went to no graduate school at all. I f the best prospects go to H a r v a r d Business School, then their salaries w i l l be higher, even i f they gain n o t h i n g at all from the H a r v a r d Business School training. Attempts to evaluate the effect of psychoanalysis r u n afoul of a similar obstacle. One cannot simply compare a group of neurotics w h o have under­ gone psychoanalysis w i t h another group of neurotics w h o have had no psychoanalytic treatment. The group that has had psychoanalysis obviously consists of people different from those i n the other group to start w i t h , for they chose to undertake psychoanalytic treatment. M a k i n g such a choice m i g h t w e l l indicate a desire to overcome the neurosis that w o u l d have resulted i n improvement even w i t h o u t treatment. Indeed, there does not seem to be satisfactory evidence for the effectiveness of psychoanalysis. Still another example is the comparison of life expectancy of people w h o take exercise w i t h that of people w h o do not. There is always an important possibility that people exercise because thev are healthy, rather than vice versa. I n all these cases we say that there is "confounding" (see Figure 23.3);''^ that is, both the independent variable, X i , i n whose effect we are interested (exercise, psychoanalysis, H a r v a r d Business School t r a i n i n g ) and the de­ pendent variable, tj (success, mental health, physical h e a l t h ) , may be caused b y some other personal-background factor, v (brains and drive, natural healing processes, natural good h e a l t h ) . The difference between confounding and the h i d d e n t h i r d factor is that i n the former we are particularlij interested i n whether the specific variable X i is a cause of ij, whereas i n the latter we concentrate on finding out which variable causes ij. Confounding also differs from the multivariate obstacle; i n the former confounding v causes X i , whereas i n the latter the relationship be­ tween Xi and X2 is not causal. Notice the difference between this obstacle and the obstacle of nonre3. "Confounding" is a word used mostly by psychologists, and they use it to refer to al­ most any experiment that suffers from insufficient controls or incomplete design ( Under­ wood, pp. 89-90). The definition and usage here are more restricted but still common. Statisticians also use the term when they purposely give up some information in order to reduce the size of the experiment.

350

The Obstacles to Social-Science

FIGURE

23.3

Knowledge

Confounding of Variables Schooling X

Natural ability

y Professional success

spouse. I f there is noiuesponse, one cannot get data on one group of people that may be special. I n confounding, the difficulty is that certain kinds of people do not exist or are not k n o w n to exist. The people w h o do not exist are people i n every w a y exactly hke those w h o go to f l a r v a r d Business School b u t w h o do not actually go to H a r v a r d Business School. I t does not suffice simply to find a group of people w h o seem i n every possible w a y like those w h o w e n t to H a r v a r d Business School or like those w h o underwent psychoanalysis or like those w h o took exercise b u t w h o d i d not actually do those things. They w o u l d still differ i n one crucial w a y : the t h i n g that kept them from going to H a r v a r d Business School or taking exercise or psychoanalysis. Whatever that t h i n g is may w e l l t u r n out to be the c m c i a l difference between the groups. To surmount the obstacle of possible confounding often requires excep­ tional effort and cost. The only way I can t h i n k of to measure fairly the effect of a H a r v a r d Business School education w o u l d be to select randomly a group of y o u n g people w h o w o u l d otherwise not go to H a r v a r d Business School—perhaps from among the entrants to the t r a i n i n g programs of large business firms—send them to H a r v a r d Business School, and then over a l o n g period of years compare their careers w i t h those of the rest of the group from w h i c h they were d r a w n randomly.^ I n the investigation of the effect of exercise, i t w o u l d be difficult to experi­ ment w i t h a group of people over a period l o n g enough so that the exercise m i g h t be considered to make any difference. A n d , even i f i t weren't, i t w o u l d be many years from the beginning of the experiment u n t i l the results were available. Sometimes i t is possible to carry out a quasi experiment. I n the case of relating psychoanalysis to cure, there may be no psychoanalysts i n some part of the country. Assuming that problems of geographical differences could be surmounted (see p. 198 ) , one could compare the rate of recovery from neurosis where psychoanalytic treatment is available w i t h that where there are no psychoanalysts. This design m i g h t be particularly useful i n measuring the effects of psychotherapeutic treatment on psychosis i n mental hospitals. 4. But difficulties never end. B. Glad pointed out that even this experimental design is subject to an important flaw: People who return from the Harvard Business School proram may progress more quickly simply because their superiors are impressed by the egrees, even if the training really has no value.

Obstacles to the Search for Causal Relationships 4. F e e d b a c k ; Interaction B e t w e e n and Independent Variables

351

Dependent

I n this complicated w o r l d of ours, there are many situations i n w h i c h A and B influence each other. W e see this type of m u t u a l influence i n most h u m a n relationships: The more you love me, the more I love you, the more you love me. You step on m y toe, I jab you w i t h m y elbow, you h i t me on the nose. A change i n demand for a product causes a change i n the price, w h i c h causes a change i n the supply of the product, w h i c h causes a change i n the price, w h i c h causes a change i n demand, on and on forever. Cause and effect bounce back and f o r t h like a ping-pong b a l l or an echo i n a canyon. This effect is called "feedback" or "mutual causation." Feedback can be positive, i n w h i c h case i t increases the original effect, or i t can be negative, i n w h i c h case i t tends to "dampen" the original effect. Jeremiah describes a feedback reaction i n v o l v i n g the C h i l d r e n of Israel: "Thine o w n wickedness shall correct thee, and thy backslidings shall re­ prove thee" (Jeremiah 2:19). The term "feedback" was invented by the radio engineers for circuits in w h i c h the output signal is "fed back," to be amplified again. The same idea has proved useful i n the social sciences. For example, the basic Keynesian economic model can be shown i n the schematic form that electronics engi­ neers use. Figure 23.4 indicates that national income depends upon invest­ ment and consumption and that consumption i n t u r n depends upon national income. There is m u t u a l interaction between national income and consump­ tion; the influence of each feeds back upon itself.^ Another example: I n areas where liquor consumption is high, there are many stores that sell liquor. Does the existence of many stores lead to h i g h consumption, as the prohibitionists believe, or vice versa? The influence is likely to r u n i n b o t h directions. Still another example of m u t u a l causation is the relationship between social status and choice of career. For example, doctors tend to claim that their o w n specialties have the highest social status among medical special­ ties. D o doctors choose these specialties because of the h i g h social status, or do they "put a halo" on their o w n specialties to make themselves feel good? Probably both. W h e n m u t u a l causation exists, i t is not easy to ascertain w h a t causes w h a t and to w h a t extent. For example, advertising researchers w a n t to k n o w w h i c h advertising causes sales and h o w much. D . Starch and others have shown that a relationship exists between readership of magazine ads and purchase of the advertised products. Starch developed a system b y w h i c h he purported to measure the effect of the advertising b y measuring the difference between rates of sales to readers of the advertisement and to nonreaders of the advertisement. But, although people w h o read particular 5. The equations represented in the diagram are National Income = Investment + Consumption Y—I C Consumption = Constant Multiple X National Income C = aY

352 FIGURE

The Obstacles to Social-Science 23.4

Knowledge

An Example of Feedback

Constant Investment



National income



Consunnption

Multiple

Source: Mathematical Economics, 2nd ed., p. 289, by R. G. D. Allen, 1959. Reprinted by per­ mission of >'lacmillan London and Basingstoke.

advertisements b u y more of the products advertised i n the ads, i t is also true that people w h o b u y particular products are more likely t h a n other people to read the advertisements for those products after they have purchased the products (perhaps to reassure themselves that they have bought the right car or w h a t e v e r ) . I f so, to w h a t extent can one say that reading advertise­ ments causes the purchase of the products or that purchase of the products causes reading the ads? There is no easy answer. I n this case one m i g h t subtract the effect caused by people w h o read the ads b u t w h o had already bought the products. Sometimes complex mathematical, statistical, or simulation techniques can help to disentangle the relationship (Johnston), b u t often there is no way of disentangling cause and effect i n such cases of m u t u a l causation. I n the social sciences other than economics, complete analysis of a system of feedback variables is not likely to be necessary or successful because "systems analysis" functions best w h e n the system is relatively closed, as i t is w i t h the echo i n the cannon and w i t h radio circuits. I n the social sciences, the influence of A upon B and of B upon A is almost always mediated b y so many other elements that there can be no one-to-one relationship between w h a t A does to B and w h a t B does to A. A n d so, as a matter of practice, i t is almost always the best policy to look at the influence of the variables on one another i n one direction and one at a time. Sometimes the researcher dodges by stating that he w i l l restrict his interest to establishing a relationship between the variables, rather than t r y i n g to ascertain the amounts and directions of cause and effect. B u t i n some cases, especially i n research done to a i d policy decisions, the researcher cannot cop a plea that way. For example, we w a n t to k n o w i f smoking causes cancer, not merely whether the t w o are related. Knowledge of a relationship is useful as a beginning, b u t knowledge of cause and effect is of ultimate interest here as so often else­ where.

5. S u m m a r y Knowledge of causes and effects is all-important i n b u i l d i n g a social science. I t is also c m c i a l i n situations where the aim is to make changes that w i l l

Obstacles

to the Search

for Causal

Relationships

353

improve the quahty of hfe. Causation can never be k n o w n i n some absolute or u l t i m a t e sense. B u t we can get closer to a useful understanding of causal relationships w i t h astute research. There are several sorts of obstacles to a sound understanding of a causal relationship: a) A t h i r d factor may account for b o t h the hypothesized cause and effect. Systematic testing under a w i d e range of conditions can reduce the likelihood that such an u n k n o w n factor is causing confusion, b ) There may be several important variables jointly responsible for the phenomenon i n question. Sound multivariate research, experimental or nonexperimental, can appraise the relative importance of the independent variables, c ) The line of causation may r u n from the hypothesized effect to the hypothesized cause. This is most easily checked b y experimentation, d ) Both variables may influence each other, either simultaneously or sequentially. Discerning the scheme of causation is a very general problem i n socialscience research. This chapter suggests some techniques. Others are found throughout the book.

EXEBCISES 1. G i v e e x a m p l e s f r o m r e s e a r c h in y o u r f i e l d t o i l l u s t r a t e t h e f o l l o w i n g o b ­ stacles: a. h i d d e n t h i r d f a c t o r b. m u i t i v a r i a b l e c a u s a t i o n c. c o n f o u n d i n g t h e d e p e n d e n t a n d i n d e p e n d e n t v a r i a b l e s d. feedback; interaction between independent and dependent variables 2. S u g g e s t m e t h o d s of c l a r i f y i n g t h e c a u s a l r e l a t i o n s h i p in e a c h e x a m p l e in Exercise 1.

ADDITIONAL

BEADING

FOB CHAPTEB

23

T h e r e is a n e x c e l l e n t s i m p l e e x p o s i t i o n of c r o s s - c l a s s i f i c a t i o n in Z e i s e l .

24 t h e m a s t e r obstacle: c a s t 1. 2. 3. 4. 5. 6. 7.

Procedure to Produce Data for Study Sampling and Cost Sample Size How Many Variables to Investigate Amount of Tolerable Bias Managerial Cost Efficiency in Research Summary

Tfie cost of carrying out a piece of researcli is tlie most important obstacle that a researcher must overcome. This obstacle w i l l face (or faze) y o u i n one f o r m or another i n every single piece of empirical research y o u conduct. You must economize to do a satisfactory amount of good research, and your economies must be of money, energy, and time. Logically, this obstacle belongs at the head of the list. B u t the most logical order of presentation is not always the best pedagogical order, and therefore the discussion of cost as an obstacle has been deferred u n t i l now. Cost is the master obstacle because almost all other obstacles can be overcome w i t h large amounts of money and time. For example, i f y o u were studying reading habits and i f cost were no object, you could assign a f u l l time observer to watch and photograph every page that your subject reads or looks at, rather than depending on questions asked of the subject. Or y o u could reduce experimental bias b y h a v i n g many different experimenters re­ run the experiment. Reflection w i l l convince y o u that most other obstacles could also be overcome w i t h expenditures of money. Cost is more than an obstacle, however; i t is often the very factor that makes research necessary or useful. The information that comes from re­ search often produces profit by reducing cost. Furthermore, scientific empir­ ical research is a more efficient and therefore less costly method of getting knowledge than hit-or-miss methods of gathering knowledge. For example, i t is cheaper to pretest a piece of advertising copy i n a laboratory than to run an entire advertising campaign to find out whether the copy works well.

The Master Obstacle: Cost

355

There are several kinds of research decisions i n w h i c h cost is a crucial factor; we shall n o w discuss them one b y one. 1. P r o c e d u r e to P r o d u c e D a t a for Study Sometimes an investigator is l i m i t e d to existing data; for example, one can­ not go back and resurvey v o t i n g preferences before the election of 1936. A n d sometimes Hmited t i m e requires the use of existing data; one cannot always w a i t for a survey or experiment to be executed and the data pro­ cessed. Nevertheless, a researcher is often faced w i t h a choice of whether to use existing data or to collect new data. The decision should be dictated by considerations of cost and accuracy. B u t lack of accuracy is itself a cost. T o make this decision rationally, then, y o u must assess the cost that w o u l d result from lack of accuracy, as w e l l as the cost of obtaining accuracy. The cost of the inaccuracy caused b y flaws i n existing data must be weighed against the cost of creating new data. W h a t is most i m p o r t a n t is that y o u make explicit to yourself just h o w m u c h you are spending to avoid the flaws i n the existing data. Another data-production choice is between experimentation and field sur­ vey. Sometimes a laboratory experiment is cheaper than a survey, b u t often a survey is cheaper. Again, y o u must balance the cost of the technique against the higher value of the information produced b y the more expensive technique. ( I f the less expensive technique produces better data, your choice is made for you. ) Social scientists w h o do survey research must often decide among inter­ views conducted i n person, over the telephone, and b y m a i l questionnaire. I n almost every case, the personal interview is more expensive, b u t some­ times the extra expense is necessary and justified. I n surveys like A. Kinsey's i n w h i c h subjects do not respond easily and straightforwardly, the skill of the interviewer i n face-to-face encounter is necessary, especially w h e n there are many questions to be asked. O n the other hand, the questions asked i n surveys on smoking and health are few, easy to answer, and not likely to be touchy, so the m a i l questionnaire is quite satisfactory except for the problem of nonresponse. The expense of conducting hundreds of thousands of inter­ views for the smoking studies w o u l d have made those studies impossible. Research on magazine readership has used all three i n t e r v i e w i n g tech­ niques. The decisions i n particular studies have been based on the exact nature of the information required plus the costs and revenues attached to the various accuracy levels. For example, a large consumer magazine can afford to spend more money on i n t e r v i e w i n g techniques than can a small trade magazine because positive information w o u l d mean a m u c h greater gain i n revenue for the large consumer magazine.

356

The Obstacles to Social-Science

Knowledge

2. S a m p l i n g a n d C o s t I n 1886 C. Booth sought to find out h o w many poor people there were i n L o n d o n , because the extent of poverty and misery was a subject for debate i n the 1880s (just as the actual number of people i n E n g l a n d had been a subject for debate a h u n d r e d years earlier) and because he w a n t e d to find out w h a t the causes of this poverty and misery were. Obviously he could not finance the gathering of such a monumental mass of data. Therefore, he t u r n e d to "the m e t h o d of wholesale interviewing," as Beatrice W e b b called i t (Moser, p. 19)—collecting information from school attendance officers, whose job i t was to visit i n d i v i d u a l homes and children. I n the 1890s Rowntree felt that Webb's method was not sufficiently ac­ curate, and he therefore employed interviewers to go to every wage-earning home. H e w o r k e d i n the city of York, however, w h i c h was m u c h smaller than L o n d o n . I t was an enormous advance i n social research w h e n i n 1912 Bowley studied l i v i n g conditions i n Reading b y means of a sample. N o t only was a sample much cheaper than Rowntree's method, b u t w i t h the smaller datacollection effort Bowley could take more care w i t h such technical matters as nonresponse bias. Hence Bowley's results p r o b a b l y were more accurate than a 100 per cent sample w o u l d have produced. The decisions on whether to sample, h o w to sample, how large a sample to take, and w h a t sampling method to use are basically cost decisions. I f cost were no object, every study w o u l d take a very large per cent of the potential subject matter, because accuracy w o u l d then be greater and error smaller than i n a smaller sample. The presidential election i n the U n i t e d States is a 100 per cent sample ( t h a t is, not a sample) of all American people w h o w a n t to vote. I do not k n o w whether the nation goes to all this expense to prevent any trace of sampling error because its citizens do not k n o w any better or because the election offers an i m p o r t a n t r i t u a l of participation. ( T h i s 100 per cent sample does not guarantee a perfect survey, however; i t was especially unsure i n Boston w h e n ballot-box stuffing and repeat v o t i n g were well-de­ veloped local arts.) The U.S. Census attempted to be a 100 per cent sample for a long time. I n recent censuses, however, some questions were asked of only one person i n four, for example, and future censuses w i l l probably go further w i t h samphng. The p r e l i m i n a r y reports issued b y the U.S. Bureau of the Census for quick information dissemination are also products of a sampling process ( b u t i n this case the sample is a sample of the data, as distinguished from a sample of p e o p l e ) . 3. S a m p l e Size I have said that the only reason for not t a k i n g a 100 per cent sample of the universe is cost. B u t h o w b i g should the sample be? The most common ways

The Master Obstacle: Cost

357

of choosing a sample size are to find out h o w b i g a sample is customary i n similar research and to take as b i g a sample as the budget w i l l allow. Both methods are logically fallacious, b u t the foraier contains some grains of practical wisdom. Observation of others' w o r k can help the researcher to take advantage of their trial-and-error learning. For example, Kinsey spent m u c h time i n w o r k i n g out appropriate sample sizes, and, i f you were to carry on w o r k similar to his, i t m i g h t be reasonable to imitate his sample sizes. F i t t i n g your sample size to the available budget funds requires circular reasoning, because the budget has to be fixed on the basis of h o w large the sample size w i l l be. Someone has to make an independent decision at some stage of the game. The best general advice about sample size is to take a pretest sample that is small b y amj standard and then see whether you can observe any clear pattern of the sort you are looking for. You may find that even such a t i n y sample is b i g enough. I f not, the variation w i t h i n your pretest sample w i l l guide y o u i n deciding h o w m u c h larger a sample y o u are likely to need. Chapter 31 gives you detailed rational methods for deciding h o w large a sample to take.

4. H o w M a n y Variables to Investigate There are always more variables that might be i m p o r t a n t than y o u have time and money to investigate. Therefore, y o u must decide w h i c h of the variables are promising enough to be w o r t h the cost—a process of eco­ nomics and good judgment. As R. Fisher p u t i t : I t is an essential characteristic of experimentation that it is carried out with limited resources, and an essential part of the subject of experimental design to ascertain how these should be best applied; or, in particular, to which causes of disturbance care should be given, and which ought to be deliberately ignored. (Fisher, p. 22) For example, a wise advertising researcher does not bother to experiment w i t h various shades of i n k i n a direct-mail advertisement; rather, she con­ centrates on such variables as the advertising message and the offer the advertisement makes. O n the other hand, cancer researchers are w i l l i n g to follow up leads that may seem very t r i v i a l at first because the problem is so important and because there are too few good leads of any k i n d . There is not m u c h that can be said about this decision except that the researcher must get into the habit of asking herself whether an additional variable is suflficiently promising to justify its cost. There just are no scien­ tific or automatic routines to aid i n m a k i n g such decisions.

358

The Obstacles to Social-Science

Knowledge

5. A m o u n t of T o l e r a b l e B i a s The Kinsey study made no attempt to achieve a random sample of the population of the U n i t e d States, because the attempt w o u l d have been impossibly expensive. Furthermore, even i f astronomical sums h a d been spent and a random sample had been attempted, the departures from ran­ domness w o u l d still have been very large because of nonresponse. Kinsey's judgment on this matter was mostly approved later b y the statistical review board of the A m e r i c a n Statistical Association, though the statisticians d i d call for a small probability-sample test of just h o w m u c h bias Kinsey's procedure injected into his study (Cochran, et al.). Other statisticians were less satisfied, however ( W a l l i s , 1949); this is a matter of judgment on w h i c h there is room for competent professionals to disagree. Cost also affected the geographic pattern of Kinsey's sampling. H a l f the subjects were from Indiana. B u t Kinsey judged that the expense of d r a w i n g more of his subjects from farther away w o u l d not be justified b y the prob­ able decrease i n bias. Bias and cost must also be balanced w h e n deciding h o w m u c h money and effort to spend on converting noni esponders into responders. Should the researcher call back twice? five times? eleven times? T h e decision must be based on the costs of calling back and the costs of inaccuracy caused b y the

6. M a n a g e r i a l C o s t Efficiency in R e s e a r c h Some researchers are crackerjack managers. T h e y use the research money at their disposal as efficiently and conscientiously as i f i t came out of their o w n pockets. B u t too many researchers take no pride i n cost efficiencies and ignore all reasonable procedures for calculating w h a t should and should not be done. Scientific accuracy and avoidance of error are important, b u t one can be too finicky about such things as not using existing data that may not be quite perfect for one's purposes, rather than collecting new data from scratch. Beside appealing to your sense of justice about other people's money, I suggest that economical practices may also be personally eflScient for y o u and help y o u to accomplish a l o t more w o r k i n your lifetime. H a v i n g funds available to pay for help may expedite your w o r k . But, beyond a certain level, too much money may slow y o u d o w n b y i n d u c i n g y o u to magnify the size of your project to the p o i n t at w h i c h y o u must do large amounts of administrative and supervisory w o r k . O r an excess of money may lead y o u to collect too m u c h data and thereby slow d o w n your analysis. M o n e y and the help i t buys are useful only i f they do not prevent y o u from t r a v e l i n g l i g h t and fast. Academic scientists sometimes forgo efficient shortcuts out of unnecessary prissiness. For example, market researchers have k n o w n for years that i t is

The Master Obstacle: Cost

359

often cheaper i n the long r u n to pay interviewees and panel members and that such payment can often be made w i t h o u t compromising the research. Yet I k n o w of an academic researcher w h o p u t u p a m o n u m e n t a l battle against g i v i n g away a cheap pencil to induce people to answer a question­ naire. ( A later test showed that the pencils far more than p a i d for them­ selves.) A slight acquaintance w i t h the basic notions of managerial economics can help a researcher, just as i t can help any other administrator, to be more efficient. Consider, for example, a study situation i n w h i c h y o u hand out onequestion questionnaires i n person at a huge convention. I t pays to take a sample far larger than any sample size you m i g h t reasonably need because the extra cost of the extra questionnaires and their distribution is practically nil. Perhaps this statement b y a famous chemist w i l l add w e i g h t t o the argu­ ment: I n designing a bridge, an engineer naturally chooses the most economical design which satisfies all the specifications, including the aesthetic requirements. In designing an elaborate experiment, questions of cost are all too frequently ig­ nored completely. This is partly because of the great difficulty of making good estimates of the time required to carry out a given investigation, but it is also partly a traditional attitude that somehow science is above vulgar monetary considerations. W i t h the increasing cost of research it becomes necessary to take economic factors into account, however difficult this may be. Certainly there is no excuse for doing a given job in an expensive way when it can be carried through equally effectively with less expenditure. I t is much more difficult to decide whether a given project should be carried out at all, considering its probable cost. I n applied research there sometimes exist fairly definite criteria, such as the possible monetary benefits of a successful research, coupled with a rough estimate of the chance of success. I n pure science no estimate of monetary value is usually available or in fact desirable. Here cost still enters in deciding between alternative problems. Nat­ urally this is not the only factor, but it is certainly wrong to disregard it al­ together. Cost estimates should include not only direct expenditures for materials but also salaries and overhead, even if these are not directly charged to the project. (Wilson, p. 6 ) Computers and modern data-processing technology have radically changed the economics of research. I mentioned earlier h o w simple I B M machines were developed to make the U.S. Census cheaper and quicker to analyze. B y n o w machines have reduced the cost of h a n d l i n g data so m u c h that many studies are possible that were out of the question before.

360 7.

The Obstacles to Social-Science

Knowledge

Summary

Cost is the master obstacle to getting empirical knowledge, because w i t h large enough expenditures of time and money almost all other obstacles can be overcome. Your task as a researcher is to devise sufficiently efficient ways of overcoming obstacles of all kinds so that w i t h the time and money at your disposal, y o u can produce relatively m u c h useful knowledge. Samphng rather than studying the entire population is a basic cost-re­ ducing tactic. Using available data is another. Also, one should use good judgment to avoid collecting unnecessary data on irrelevant /ariables, and not demand more accuracy than is required b y the overall purpose of the research. A n d i t is all-important to care about being a good manager of research resources, b o t h because i t w i l l make you more productive and because you owe i t to whoever—your institution, or society as a whole—is ultimately p a y i n g the b i l l .

EXERCISE 1. G i v e e x a m p l e s in w h i c h a. e x p e r i m e n t is c h e a p e r t h a n s u r v e y b. s u r v e y is c h e a p e r t h a n e x p e r i m e n t c. s a m p l i n g is n o t u s e d at p r e s e n t b u t s h o u l d b e d . a s l i g h t l y b i a s e d s a m p l e m a y b e o b t a i n e d at a m u c h l o w e r c o s t t h a n a n unbiased sample

part four e x t r a c t i n g

the meaning of data

analysis of simple and regression 1. 2. 3. 4.

Testing for the Existence of Simple Relationships Characterizing the Form of an Association Regression Analysis: Characterizing a Causal Relationship Evaluating the Strength and Importance of Observed Differences and Relationships 5. Summary

After data have been collected, they must be rearranged and fiddled w i t h to make t h e m y i e l d up the information they contain. This is the process of analysis. M a n y different processes i n social-scientific research are called "analysis," and perhaps the most i m p o r t a n t job of this chapter is to distinguish among them. W h a t type of analysis a researcher performs should depend u p o n the type of research question that he seeks to answer, as w e l l as on the m e t h o d by w h i c h he collects the data and the sharpness w i t h w h i c h the research question or hypothesis has been formulated. Descriptive classification and measurement research call for processes that some writers do not call "analysis." There are no independent variables to relate to dependent variables. Rather, there is b u t one—or several—vari­ able ( s ) whose relationships are not i n question, as i n a census or other mea­ surement of multiple variables. The analysis begins w i t h standardizing the data and separating i t into convenient or interesting categories, and ends w i t h summarizing statistics or w i t h graphs or tables of the data. The appro­ priateness of the various summarizing statistics for various types of research question was discussed briefly i n Chapter 4. Discussions and illustrations of the logic of tabular and graphic display can be found i n almost any elemen­ tary statistics book or i n H . Zeisel (1957, Chaps. 2, 4 ) . D . H u f f (Chaps. 3, 5, 6, 9 ) shows h o w you and others can deceive people w i t h tables and graphs. A good descriptive analysis shows the i m p o r t a n t data i n a form such that

Analysis of Simple Data

363

they w i l l be clearly understood and their meaning grasped b y the reader. For example, there should be a separate table or graph for each idea y o u are t r y i n g to communicate, rather than one enormous master compilation i n w h i c h n o t h i n g is obvious. T h e more complicated processes of analysis take place i n causal- and noncausal-relationship research and i n comparison research. There are at least seven major types of relationship analysis: statistical testing to determine whether an association exists; characterizing the form of a relationship; evaluating the importance of the observed differences and relationships; searching for new relationships w i t h i n the data; digging deeper into the observed association to make i t y i e l d a deeper meaning; refining observed associations i n t o causal relationships; generalizing and predicting on the basis of the observed data. Each of these processes w i l l n o w be discussed in turn. This chapter deals w i t h situations i n w h i c h you wish to investigate the relationship between t w o given variables for w h i c h the data are already i n hand. T h e f o l l o w i n g chapter deals w i t h the analysis of data where more than t w o variables are involved, and where the variables are not already chosen at the time you begin your analysis.

1. T e s t i n g for the E x i s t e n c e of Simple Relationships To distinguish and to establish relationships are the basic tasks of science, and therefore the category and the association are the basic conceptual units i n science. T o say that there is an association between t w o variables is to say that w h e n one phenomenon varies, the other is likely to vary i n a predictable manner. A l l the more complicated statements i n science—cause-and-effect statements, for example—are b u i l t upon the foundation of the association. This is the question that relationship testing seeks to answer: Is there, or is there not, an association between a particular pair of variables? Some­ times the answer is statistically obvious immediately. For example, J. Masserman observed that, w h e n given a choice of m i l k w i t h alcohol or w i t h o u t alcohol, only t w o of sixteen n o r m a l cats chose liquor, whereas ten of sixteen neurotic cats chose liquor. I t was then reasonably obvious that the difference between t w o of sixteen and ten of the sixteen indicates a "real" association; that is, i f he were to r u n the experiment over again, the neurotic cats almost surely w o u l d choose the liquor more often than the normal cats. Of course, the existence of an association w o u l d be even more conclusive i f the results had been 20 of 160 versus 100 of 160, rather than 2 of 16 versus 10 of 16. But, even so, no further statistical proof of the association is necessary, and the analysis of Masseiman's data is complete. A l l that is left for h i m to determine is that the experimental results are not an artifact of b a d sampling or some other bias and then to say w h a t the association between neurosis and alcohol means—a. matter of theorizing that is beyond the scope of this book.

364

Extracting

the Meaning of Data

But, i f the results for the neurotic and nonneurotic group h a d been less different than 2 / 1 6 and 10/16, Masserman could not have immediately asserted w i t h confidence that there is a relationship between neurosis and liquor d r i n k i n g . I f the research is experimental—as i n this case—the best procedure is to repeat the experiment on more subjects ( c a t s ) ; i f the same results appear, one's belief i n the existence of the relationship w i l l be strengthened. Furthermore, i f the study is a survey, i t is frequently possible to obtain more data. But, i f you cannot increase the size of your sample and if there is doubt about the existence of a difference between t w o ( o r more) groups or a relationship between t w o variables, y o u must t u r n to statistical testing for help. Often your data measure t w o (or m o r e ) characteristics for each observa­ tion, and y o u wish to determine the existence of an association between the two characteristics. Let's say, for example, that y o u have I . Q . and athletic scores for ten high-school boys, as shown i n Table 25.1, and y o u w a n t to k n o w whether the t w o abilities are related. A quick and simple test is the 'Two-way classification." You rank each observation from h i g h to l o w , as shown i n Columns 3 and 4 of Table 25.1. T h e n y o u construct a table that shows h o w often the above-average observations of one characteristic are associated w i t h the above-average observations of the other characteristic, the below-average w i t h the below-average, and so on, as shown i n Table 25.2. I f there is a strong association, the observations w i l l be p i l e d up i n t w o diagonal cells, whereas i£ the association is not strong the observations w i l l be scattered equally among the four cells. I n this case there appears to be an association, b u t the matter is sufficiently inconclusive that we need to k n o w how often a 4-4/1-1 split w o u l d occur by chance. ( A statistical test of that p r o b a b i l i t y is shown i n Chapter 28, page 416.) I f the data came i n quantitative form rather than simply h i g h or l o w , you can make a more revealing test b y p l o t t i n g the data quantitatively, as the I.Q.-athletic score data are p l o t t e d i n Figure 25.1. N o w we can see quite

TABLE 25.1 Hypothetical Athletic and I.Q. Scores for High-School Boys Athletic

Score

I.Q. Score

Athletic

Rank

I.Q.

Rank

U)

(2)

(3)

(4)

97 94 93 90 87 86 86 85 81 76

114 120 107 113 118 101 109 110 100 99

1 2 3 4 5 6 7 8 9 10

3 1 7 4 2 8 6 5 9 10

Analysis of Simple Data

365

TABLE 25.2 I.Q. Score Above Below Average Average CD

Mil

1 1

4 O - C O