Numbers, Hypothesis & Conclusions: A Course In Statistics For The Social Sciences, 3rd Edition 1485125448, 9781485125440, 9781485129844

Statistics and quantitative methods are brought to life for social science students in this tutorial course. This revise

3,584 474 26MB

English Pages 681 Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Statistics for the Behavioral and Social Sciences: A Brief Course 129202304X, 9781292023045

For one-quarter/semester courses that focus on the basics in statistics or combine statistics with research methods. The

1,184 107 10MB Read more

Bayesian Statistics for the Social Sciences (Methodology in the Social Sciences) [2 ed.] 1462553540, 9781462553549

The second edition of this practical book equips social science researchers to apply the latest Bayesian methodologies t

235 70 13MB Read more

Statistics for the Behavioral and Social Sciences: A Brief Course, Books a la Carte [6 ed.] 0205989063, 9780205989065

For one-quarter/semester courses that focus on the basics or combine statistics with research methods, this revised text

1,548 142 17MB Read more

Statistics for the Behavioral and Social Sciences: A Brief Course, Books a la Carte [6 ed.] 0205989063, 9780205989065

For one-quarter/semester courses that focus on the basics or combine statistics with research methods, this revised text

604 55 154MB Read more

Prime numbers and the Riemann hypothesis

210 96 4MB Read more

Statistics for Social Sciences [1 ed.] 9789351506560, 9789351506553

A comprehensive guide to the practical applications of statistics in social sciences This book brings out the relevance

189 103 12MB Read more

Statistics for International Social Work And Other Behavioral Sciences

Statistics for International Social Work And Other Behavioral Sciencespresents statistics using straightforward, accessi

1,094 134 17MB Read more

Statistics for health, life and social sciences 9788776817404, 8776817407

Includes endnotes, answers to exercises, and an appendix dataset.

858 110 14MB Read more

Statistics for the Social Sciences: A General Linear Model Approach 1107576970, 9781107576971

Written by a quantitative psychologist, this textbook explains complex statistics in accessible language to undergraduat

2,737 249 52MB Read more

Applied Statistics Using Stata: A Guide for the Social Sciences [1 ed.] 1473913225, 9781473913226

Clear, intuitive and written with the social science student in mind, this book represents the ideal combination of stat

1,477 450 30MB Read more

Author / Uploaded
Colin Tredoux
Kevin Durrheim

Table of contents :
Front cover......Page 1
Title page......Page 2
Imprint page......Page 3
Table of contents......Page 4
Preface......Page 11
Contributors......Page 13
Glossary of symbols......Page 14
Section 1: Statistics......Page 16
Tutorial 1: Numbers, variables, and measurement......Page 17
The advantages of quantitative methods......Page 18
Functions of quantification......Page 22
Some basic concepts......Page 24
Summary......Page 29
Exercises......Page 30
Tutorial 2: Displaying data......Page 31
Bar graphs......Page 32
Histograms......Page 39
Cumulative frequency diagrams......Page 44
Line graphs......Page 47
Basic rules for creating graphs......Page 55
Worked example......Page 56
Summary......Page 58
Exercises......Page 59
Further reading......Page 61
Tutorial 3: Central tendency and variation......Page 62
Measures of central tendency......Page 63
Measures of variability......Page 70
Estimating population parameters from sample data......Page 79
Worked example......Page 84
Summary......Page 88
Exercises......Page 89
Probability as frequency......Page 91
Probability and games of chance......Page 92
The multiplication and addition rules ofprobability......Page 94
Probabilities of multiple outcomes......Page 97
Worked example......Page 105
Exercises......Page 108
Tutorial 5: The standard normal distribution......Page 110
The standard normal distribution......Page 111
Two worlds: the statistical world and the real world......Page 116
Worked example......Page 121
Summary......Page 123
Exercises......Page 124
Tutorial 6: The sampling distribution of the mean......Page 126
Sampling means......Page 128
The Central Limit Theorem......Page 130
The sampling distribution and the standard normal distribution......Page 132
The standard error......Page 134
Worked example......Page 138
Summary......Page 139
Exercises......Page 140
Tutorial 7: Hypothesis testing: the z-test......Page 142
Hypothesis testing......Page 143
The z-test......Page 146
Example 1......Page 147
Worked Example 1......Page 151
Worked Example 2......Page 153
Summary......Page 155
Exercises......Page 156
Tutorial 8: Hypothesis testing: the t-test......Page 158
Using confidence intervals of the mean......Page 160
The logic of two-sample t-tests......Page 162
Independent samples t-test......Page 166
Worked Example 1......Page 168
Effect size......Page 170
The t-test for repeated measures......Page 171
Worked Example 2......Page 172
One-sample t-test......Page 174
Worked Example 3......Page 175
Exercises......Page 178
Paired or bivariate data......Page 180
Graphing paired data......Page 184
Linear and other types of relationship......Page 187
Positive and negative relationships......Page 190
Linear models and scatter......Page 194
A dataset to illustrate the calculation of Pearson’s product–moment correlation coefficient......Page 195
The meaning of r......Page 197
Calculating Pearson’s r......Page 198
Autocorrelation......Page 199
Rank correlation......Page 201
Exercises......Page 207
Tutorial 10 Simple regression......Page 212
Estimating the linear function......Page 214
Various forms of the regression equation......Page 217
Predictions versus observed values......Page 218
Calculating the regression coefficients......Page 219
Correlation is symmetric, but regression is not......Page 223
Interpreting regression coefficients......Page 225
Measuring scatter around the regression line......Page 226
Variance and regression......Page 230
Worked example......Page 235
Exercises......Page 242
Measuring a construct......Page 245
Evaluating a scale or test......Page 256
Worked example......Page 272
Summary......Page 278
Exercises......Page 279
Tutorial 12: Statistical power......Page 281
Error and statistical tests......Page 282
What determines the power of an investigation?......Page 283
Effect size......Page 286
The ‘trial’ strategy and power......Page 288
Power calculations......Page 289
Factors that influence choice of sample size......Page 295
Worked example......Page 296
Summary......Page 302
Exercises......Page 303
Tutorial 13: Analysis of variance(ANOVA)......Page 304
The rationale for using ANOVA......Page 306
The logic of ANOVA......Page 308
Comparing variance within and between groups......Page 310
Calculating one-way ANOVA......Page 312
Worked Example 1......Page 316
Multiple comparisons and effect size......Page 317
Using SPSS to do one-way ANOVA......Page 320
Worked Example 2......Page 321
Assumptions underlying ANOVA......Page 324
Worked Example 3......Page 327
Summary......Page 329
Exercises......Page 331
Tutorial 14: Factorial analysis of variance......Page 333
Why use factorial designs?......Page 335
The logic of factorial ANOVA......Page 336
Analysing factorial ANOVA designs......Page 343
Assumptions of factorial ANOVA......Page 338
Types of interactions......Page 351
Conclusion......Page 352
Summary......Page 354
Exercises......Page 355
Decomposition of variance......Page 357
Repeated measures designs and reduction/decomposition of variance......Page 360
Worked example......Page 386
Summary......Page 389
Exercises......Page 390
Tutorial 16 Multiple regression......Page 392
Worked Example 1......Page 395
Regression coefficients......Page 399
Partial correlation and multicollinearity......Page 400
Standardised regression coefficients......Page 402
The multiple correlation coefficient (R) and the standard error of estimate......Page 403
Testing statistical significance in multiple regression......Page 404
Which variables? Methods of model building......Page 405
Inspection of descriptive data and zero order correlations......Page 406
The sequential F-test......Page 409
Stepwise multiple regression......Page 410
Hierarchical multiple regression......Page 414
Mediation......Page 418
Interaction/moderation......Page 422
Categorical predictors/dummy variables......Page 425
Cross-validation......Page 430
Assumptions and limitations......Page 433
Worked Example 2......Page 435
How to write up the results of a multiple regression analysis......Page 440
Summary......Page 442
Exercises......Page 445
Tutorial17: Factor analysis......Page 449
The two classes of factor analysis......Page 450
Manifest variables and latent structure......Page 451
How do we find latent structure?......Page 453
Two families of EFA......Page 455
Principal component analysis (PCA)......Page 456
Interpreting and naming components or factors......Page 471
Factor or component scores......Page 473
PFA versus PCA......Page 475
Worked example......Page 477
Practical considerations when doing a factor analysis......Page 481
Reporting a factor analysis......Page 483
Exercises......Page 485
Classifications......Page 487
Contingency tables......Page 488
The χ2 significance test......Page 489
Measures of association in tables based on the χ2 statistic......Page 493
Isolating sources of association in r × c tables......Page 497
Assumptions of the χ2 test......Page 499
Worked example......Page 504
Summary......Page 505
Exercises......Page 506
Tutorial 19: Distribution-free......Page 508
The advantages and disadvantages of distribution-free tests......Page 509
A cornucopia of tests......Page 510
Related samples: the sign test......Page 511
Related samples: The Wilcoxon matched pairs test......Page 512
Unrelated samples: The Mann–Whitney U-test......Page 513
Three or more groups of scores: Kruskal–Wallis test for unrelated samples......Page 516
Three or more groups of scores: Friedman’srank test for related samples......Page 517
Worked example......Page 519
Exercises......Page 524
Tutorial 20: Bootstrapping and randomisation methods......Page 527
Data tables in Microsoft Excel......Page 529
Estimating a population mean using resampling in Microsoft Excel......Page 531
The spreadsheet layout......Page 533
Bootstrapping correlations......Page 538
Bootstrapping contingency tables......Page 541
Assumptions of bootstrapping......Page 550
Justifications for using bootstrapping......Page 551
Criticisms of bootstrapping......Page 552
Summary......Page 555
Exercises......Page 556
Tutorial 21: Statistical reasoning......Page 558
Rules for making statistical decisions......Page 560
Multiple means......Page 563
Variability in outcome and procedure......Page 564
Defensible reasoned argument......Page 567
Best practices......Page 571
Worked example......Page 574
Summary......Page 576
Exercises......Page 577
Section 2: Mathematics and software support......Page 578
Do you need this chapter?......Page 579
Elementary operations......Page 580
Negative numbers......Page 583
Fractions......Page 585
Decimal numbers......Page 588
Frequencies, proportions, percentages and ratios......Page 590
Power, exponents, roots......Page 591
Answers to exercises......Page 594
Do you need this tutorial?......Page 596
Some basic terms......Page 597
Equations......Page 598
Summary......Page 600
Solutions to exercises......Page 604
About graphs......Page 606
Example 1: Study and leisure hours......Page 607
Example 2: The relation between word length and recognition latency......Page 610
Example 3: A preference curve......Page 611
The direction of the line......Page 612
Points to remember......Page 613
Exercises......Page 616
Appendices......Page 618
Appendix 1: Statistical Tables......Page 619
Appendix 2: Starting with SPSS......Page 632
The SPSS environment......Page 633
Using the SPSS Data Editor......Page 636
Compute......Page 640
Recode......Page 641
Conducting statistical analyses with SPSS......Page 642
Generating graphical displays with SPSS......Page 649
Working with SPSS output......Page 652
Summary......Page 655
Exercises......Page 656
Appendix 3: Installing and learning R......Page 658
Learning how to use R and RStudio......Page 659
References......Page 662
Index......Page 672

Citation preview

NUMBERS, HYPOTHESES & CONCLUSIONS

A COURSE IN STATISTICS FOR THE SOCIAL SCIENCES Third edition

Intermediate-level topics include:

• A review of basic mathematics • Displaying data and descriptive statistics • Normal distribution theory and practice • Probability and hypothesis testing • Correlation and regression • One-way ANOVA • Nonparametric statistics.

• Factorial and repeated measures ANOVA • Multiple regression • Statistical power • Factor analysis • Bootstrapping • Multiple approaches to data.

NUMBERS, HYPOTHESES & CONCLUSIONS

Entry-level topics include:

A COURSE IN STATISTICS FOR THE SOCIAL SCIENCES

Statistics and quantitative methods are brought to life for social science students in this tutorial course. This revised edition provides an overview of entry- and intermediate-level statistics, and the material on the accompanying website provides extensive practice. Both the text and the website are structured to make learning self-directed, thus numerous worked examples, exercises, activities and tests are included. The emphasis, throughout, is on practice. Students are expected to engage with the material and experience multiple aspects of data and statistical analysis. Most of the tutorials include detailed examples of how to conduct analyses in Microsoft Excel, SPSS, or R.

Additional material is given on: • The use of spreadsheets to analyse data • The use of the internet to assist quantitative research • The use of the R and SPSS statistical software packages • Statistical tables. Third edition

About the editors Colin Tredoux is Professor in the Department of Psychology at the University of Cape Town. Kevin Durrheim is Professor in the School of Applied Human Sciences at the University of KwaZulu-Natal.

www.juta.co.za

Colin Tredoux Kevin Durrheim (Editors)

NUMBERS, HYPOTHESES & CONCLUSIONS A COURSE IN STATISTICS FOR THE SOCIAL SCIENCES Third edition

Colin Tredoux & Kevin Durrheim (Editors)

Numbers, Hypotheses & Conclusions A course in statistics for the social sciences Third edition Editors Colin Tredoux (University of Cape Town) Kevin Durrheim (University of KwaZulu-Natal)

NumHyp&Con_3e_Book.indb 1

2018/07/12 2:43:39 PM

Numbers, Hypotheses & Conclusions — A course in statistics for the social sciences First published 2002 Second edition 2013 Third edition 2019 Juta and Company (Pty) Ltd First floor, Sunclare building, 21 Dreyer street, Claremont 7708 PO Box 14373, Lansdowne 7779, Cape Town, South Africa www.juta.co.za © 2019 Juta and Company (Pty) Ltd ISBN 978 1 48512 544 0 (Print) ISBN 978 1 48512 984 4 (WebPDF) All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publisher. Subject to any applicable licensing terms and conditions in the case of electronically supplied publications, a person may engage in fair dealing with a copy of this publication for his or her personal or private use, or his or her research or private study. See section 12(1)(a) of the Copyright Act 98 of 1978. Project manager: Edith Viljoen and Carlyn Bartlett-Cronje Editor: Jyoti Singh Proofreader: Jyoti Singh Cover designer: Zach Viljoen Typesetter: Anton Stark Indexer: Michel Cozien Typeset in Minion Pro 10.5pt The author and the publisher believe on the strength of due diligence exercised that this work does not contain any material that is the subject of copyright held by another person. In the alternative, they believe that any protected pre-existing material that may be comprised in it has been used with appropriate authority or has been used in circumstances that make such use permissible under the law.

NumHyp&Con_Prelims.indd 2

2018/07/13 9:28:51 AM

Table of contents Section 1 Statistics Tutorial 1: Numbers, variables, and measurement.................................................... 2 The advantages of quantitative methods................................................................................ 3 Functions of quantification...................................................................................................... 7 Some basic concepts.................................................................................................................. 9 Problems with the quantitative approach............................................................................... 14 Summary.................................................................................................................................... 14 Exercises..................................................................................................................................... 15 Tutorial 2: Displaying data.......................................................................................... 16 Bar graphs................................................................................................................................... 17 Histograms................................................................................................................................. 24 Cumulative frequency diagrams............................................................................................. 29 Line graphs................................................................................................................................. 32 Basic rules for creating graphs................................................................................................. 40 Worked example........................................................................................................................ 41 Summary.................................................................................................................................... 43 Exercises..................................................................................................................................... 44 Further reading.......................................................................................................................... 46 Tutorial 3: Central tendency and variation................................................................ 47 Measures of central tendency.................................................................................................. 48 Measures of variability.............................................................................................................. 55 Estimating population parameters from sample data.......................................................... 64 Worked example........................................................................................................................ 69 Summary.................................................................................................................................... 73 Exercises..................................................................................................................................... 74 Tutorial 4: Probability and theoretical distributions................................................. 76 Probability as frequency........................................................................................................... 76 Probability and games of chance............................................................................................. 77 The multiplication and addition rules of probability........................................................... 79 Probabilities of multiple outcomes.......................................................................................... 82 Worked example........................................................................................................................ 90 Summary.................................................................................................................................... 93 Exercises..................................................................................................................................... 93 Tutorial 5: The standard normal distribution............................................................ 95 The standard normal distribution........................................................................................... 96 Two worlds: the statistical world and the real world............................................................ 101 Worked example........................................................................................................................ 106 Summary.................................................................................................................................... 108 Exercises..................................................................................................................................... 109

NumHyp&Con_3e_Book.indb 3

2018/07/12 2:43:39 PM

iv

NUMBERS, HYPOTHESES AND CONCLUSIONS

Tutorial 6: The sampling distribution of the mean.................................................... 111 Sampling means......................................................................................................................... 113 The Central Limit Theorem..................................................................................................... 115 The sampling distribution and the standard normal distribution...................................... 117 The standard error.................................................................................................................... 119 Worked example........................................................................................................................ 123 Summary.................................................................................................................................... 124 Exercises..................................................................................................................................... 125 Tutorial 7: Hypothesis testing: the z-test.................................................................... 127 Hypothesis testing..................................................................................................................... 128 The z-test.................................................................................................................................... 131 Example 1................................................................................................................................... 132 Worked example 1..................................................................................................................... 136 Worked example 2..................................................................................................................... 138 Summary.................................................................................................................................... 140 Exercises..................................................................................................................................... 141 Tutorial 8: Hypothesis testing: the t-test..................................................................... 143 Using confidence intervals of the mean................................................................................. 145 The logic of two-sample t-tests................................................................................................ 147 Independent samples t-test...................................................................................................... 151 Worked example 1..................................................................................................................... 153 Effect size.................................................................................................................................... 155 The t-test for repeated measures............................................................................................. 156 Worked example 2..................................................................................................................... 157 One-sample t-test...................................................................................................................... 159 Worked example 3..................................................................................................................... 160 Summary.................................................................................................................................... 163 Exercises..................................................................................................................................... 163 Tutorial 9: Scatterplots and correlation...................................................................... 165 Paired or bivariate data............................................................................................................. 165 Time-series data: A special kind of bivariate data................................................................ 169 Graphing paired data................................................................................................................ 169 Linear and other types of relationship.................................................................................... 172 Positive and negative relationships......................................................................................... 174 Linear models and scatter........................................................................................................ 179 A dataset to illustrate the calculation of Pearson’s product–moment correlation coefficient................................................................................................................................... 180 The product–moment correlation coefficient....................................................................... 182 The meaning of r....................................................................................................................... 182 Calculating Pearson’s r.............................................................................................................. 183 Autocorrelation.......................................................................................................................... 184 Rank correlation........................................................................................................................ 186 Summary.................................................................................................................................... 192 Exercises..................................................................................................................................... 192 Tutorial 10: Simple regression.................................................................................... 197 Estimating the linear function................................................................................................. 199 Various forms of the regression equation.............................................................................. 202

NumHyp&Con_3e_Book.indb 4

2018/07/12 2:43:39 PM

TABLE OF CONTENTS

v

Predictions versus observed values......................................................................................... 203 Accounting for error in the regression equation................................................................... 204 Calculating the regression coefficients................................................................................... 204 Making predictions................................................................................................................... 208 Correlation is symmetric, but regression is not.................................................................... 208 Interpreting regression coefficients........................................................................................ 210 Measuring scatter around the regression line........................................................................ 211 Variance and regression............................................................................................................ 215 Worked example........................................................................................................................ 220 Summary.................................................................................................................................... 227 Exercises..................................................................................................................................... 227 Tutorial 11: Measurement........................................................................................... 230 Measuring a construct.............................................................................................................. 230 Evaluating a scale or test........................................................................................................... 241 Worked example........................................................................................................................ 257 Summary.................................................................................................................................... 263 Exercises..................................................................................................................................... 264 Tutorial 12: Statistical power...................................................................................... 266 Error and statistical tests.......................................................................................................... 267 What determines the power of an investigation?.................................................................. 268 Effect size.................................................................................................................................... 271 The ‘trial’ strategy and power.................................................................................................. 273 Power calculations..................................................................................................................... 274 Factors that influence choice of sample size.......................................................................... 280 Worked example........................................................................................................................ 281 Summary.................................................................................................................................... 287 Exercises..................................................................................................................................... 287 Tutorial 13: Analysis of variance (ANOVA)............................................................... 289 The rationale for using ANOVA.............................................................................................. 291 The logic of ANOVA................................................................................................................. 293 Comparing variance within and between groups................................................................. 295 Calculating one-way ANOVA................................................................................................. 297 Worked example 1..................................................................................................................... 301 The ANOVA summary table................................................................................................... 302 Multiple comparisons and effect size...................................................................................... 302 Effect size.................................................................................................................................... 305 Using SPSS to do one-way ANOVA........................................................................................ 305 Worked example 2..................................................................................................................... 305 Assumptions underlying ANOVA.......................................................................................... 309 Worked example 3..................................................................................................................... 312 Summary.................................................................................................................................... 314 Exercises..................................................................................................................................... 315 Tutorial 14: Factorial analysis of variance.................................................................. 318 Why use factorial designs?....................................................................................................... 320 The logic of factorial ANOVA................................................................................................. 321 Assumptions of factorial ANOVA........................................................................................... 323 Analysing factorial ANOVA designs...................................................................................... 328

NumHyp&Con_3e_Book.indb 5

2018/07/12 2:43:40 PM

vi

NUMBERS, HYPOTHESES AND CONCLUSIONS

Types of interactions................................................................................................................. 336 Conclusion................................................................................................................................. 337 Summary.................................................................................................................................... 339 Exercises..................................................................................................................................... 340 Tutorial 15: Repeated measures analysis of variance................................................. 342 Decomposition of variance...................................................................................................... 343 Repeated measures designs and reduction/decomposition of variance............................. 345 Worked example........................................................................................................................ 371 Summary.................................................................................................................................... 374 Exercises..................................................................................................................................... 375 Tutorial 16: Multiple regression.................................................................................. 378 Worked example 1..................................................................................................................... 380 Regression coefficients............................................................................................................. 384 Partial correlation and multicollinearity................................................................................ 385 Standardised regression coefficients....................................................................................... 387 The multiple correlation coefficient (R) and the standard error of estimate.................... 388 Testing statistical significance in multiple regression........................................................... 389 The significance of R2 using the F distribution..................................................................... 390 The significance of individual variables in the regression equation................................... 390 Which variables? Methods of model building....................................................................... 390 Inspection of descriptive data and zero-order correlations................................................. 391 The sequential F-test................................................................................................................. 394 Stepwise multiple regression.................................................................................................... 395 Hierarchical multiple regression............................................................................................. 399 Mediation................................................................................................................................... 403 Interaction/moderation............................................................................................................ 407 Categorical predictors/dummy variables............................................................................... 410 Cross-validation........................................................................................................................ 415 Assumptions and limitations................................................................................................... 418 Worked example 2..................................................................................................................... 420 How to write up the results of a multiple regression analysis............................................. 425 Summary.................................................................................................................................... 427 Exercises..................................................................................................................................... 430 Tutorial 17: Factor analysis......................................................................................... 434 The two classes of factor analysis............................................................................................ 435 Some conceptual ideas underlying EFA................................................................................. 435 Manifest variables and latent structure.................................................................................. 436 How do we find latent structure?............................................................................................ 438 Two families of EFA.................................................................................................................. 440 Principal component analysis (PCA)..................................................................................... 440 Interpreting and naming components or factors.................................................................. 456 Factor or component scores..................................................................................................... 458 PFA versus PCA......................................................................................................................... 459 Worked example........................................................................................................................ 461 Practical considerations when doing a factor analysis......................................................... 466 Reporting a factor analysis....................................................................................................... 468 Summary.................................................................................................................................... 470 Exercises..................................................................................................................................... 470

NumHyp&Con_3e_Book.indb 6

2018/07/12 2:43:40 PM

TABLE OF CONTENTS

vii

Tutorial 18: Chi-square (χ2) test.................................................................................. 472 Classifications............................................................................................................................ 472 Contingency tables.................................................................................................................... 473 The χ2 significance test............................................................................................................. 474 Measures of association in tables based on the χ2 statistic................................................... 478 Isolating sources of association in r × c tables....................................................................... 482 Assumptions of the χ2 test........................................................................................................ 484 Worked example........................................................................................................................ 489 Summary.................................................................................................................................... 490 Exercises..................................................................................................................................... 491 Tutorial 19: Distribution-free tests............................................................................. 493 The advantages and disadvantages of distribution-free tests.............................................. 494 Theory of ranks......................................................................................................................... 495 A cornucopia of tests................................................................................................................. 495 Related samples: the sign test................................................................................................... 496 Related samples: The Wilcoxon matched pairs test.............................................................. 497 Unrelated samples: The Mann–Whitney U-test................................................................... 498 Three or more groups of scores: Kruskal–Wallis test for unrelated samples.................... 501 Three or more groups of scores: Friedman’s rank test for related samples........................ 502 Worked example........................................................................................................................ 504 Summary.................................................................................................................................... 509 Exercises..................................................................................................................................... 509 Tutorial 20: Bootstrapping and randomisation methods.......................................... 512 Data tables in Microsoft Excel................................................................................................. 514 Estimating a population mean using resampling in Microsoft Excel................................. 516 The spreadsheet layout............................................................................................................. 518 Bootstrapping correlations....................................................................................................... 523 Bootstrapping contingency tables........................................................................................... 526 Assumptions of bootstrapping................................................................................................. 535 Justifications for using bootstrapping..................................................................................... 536 Criticisms of bootstrapping..................................................................................................... 537 Summary.................................................................................................................................... 540 Exercises..................................................................................................................................... 541 Tutorial 21: Statistical reasoning................................................................................ 543 Rules for making statistical decisions..................................................................................... 545 One mean................................................................................................................................... 548 Two means.................................................................................................................................. 548 Multiple means.......................................................................................................................... 548 Defending statistical decisions................................................................................................ 549 Variability in outcome and procedure.................................................................................... 549 Defensible reasoned argument................................................................................................ 552 Best practices.............................................................................................................................. 556 Conclusion................................................................................................................................. 559 Worked example........................................................................................................................ 559 Summary.................................................................................................................................... 561 Exercises..................................................................................................................................... 562

NumHyp&Con_3e_Book.indb 7

2018/07/12 2:43:40 PM

viii

NUMBERS, HYPOTHESES AND CONCLUSIONS

Section 2 Mathematics and software support Tutorial 22: Basic work with numbers........................................................................ 564 Do you need this chapter?........................................................................................................ 564 Number systems........................................................................................................................ 565 Elementary operations.............................................................................................................. 565 Negative numbers...................................................................................................................... 568 Fractions..................................................................................................................................... 570 Decimal numbers...................................................................................................................... 573 Frequencies, proportions, percentages and ratios................................................................. 575 Power, exponents, roots............................................................................................................ 576 Answers to exercises................................................................................................................. 579 Tutorial 23: Equations, substitution, and summation............................................... 581 Do you need this tutorial?........................................................................................................ 581 Some basic terms....................................................................................................................... 582 Equations.................................................................................................................................... 583 Summary.................................................................................................................................... 585 Solutions to exercises................................................................................................................ 589 Tutorial 24: Reading and understanding graphs........................................................ 591 About graphs.............................................................................................................................. 591 Example 1: Study and leisure hours........................................................................................ 592 Example 2: The relation between word length and recognition latency............................ 595 Example 3: A preference curve................................................................................................ 596 The direction of the line........................................................................................................... 597 Summary.................................................................................................................................... 598 Exercises..................................................................................................................................... 601

Appendices Appendix 1: Statistical Tables..................................................................................... 604 Table A1.1: The standard normal distribution...................................................................... 605 Table A1.2: Values of the t-distribution for varying degrees of freedom (df), and α....... 607 Table A1.3: Calculating power (1 – β) from a known delta (δ)........................................... 608 Table A1.4: Table of the F-distribution (α = 0.05)................................................................. 609 Table A1.5: Table of the F-distribution (α = 0.01)................................................................. 610 Table A1.6: Values of Tukey’s studentised range statistic (Q).............................................. 611 Table A1.7: Values of the χ2 distribution for varying degrees of freedom (df), and α...... 612 Table A1.8: The signs test......................................................................................................... 613 Table A1.9: Wilcoxon matched pairs...................................................................................... 614 Table A1.10: Critical values of the Mann-Whitney U for a directional test at 0.05 or a non-directional test at 0.1................................................................................................. 615 Table A1.11: Critical values of the Mann-Whitney U for a directional test at 0.05 or a non-directional test at 0.1................................................................................................. 616 Appendix 2: Starting with SPSS.................................................................................. 617 The SPSS environment............................................................................................................. 618 Setting up an SPSS data file...................................................................................................... 621 Using the SPSS Data Editor...................................................................................................... 621 Compute..................................................................................................................................... 625

NumHyp&Con_3e_Book.indb 8

2018/07/12 2:43:40 PM

TABLE OF CONTENTS

ix

Recode......................................................................................................................................... 626 Select cases................................................................................................................................. 627 Weight cases............................................................................................................................... 627 Conducting statistical analyses with SPSS............................................................................. 628 Generating graphical displays with SPSS............................................................................... 634 Working with SPSS output....................................................................................................... 637 Summary.................................................................................................................................... 640 Exercises..................................................................................................................................... 641 Appendix 3: Installing and learning R........................................................................ 643 Downloading and installing R................................................................................................. 644 Learning how to use R and RStudio....................................................................................... 644 References.................................................................................................................................. 648 Index........................................................................................................................................... 657

NumHyp&Con_3e_Book.indb 9

2018/07/12 2:43:40 PM

Preface This book is the core text for a tutorial programme in statistics for social science and humanities students. Although it can be used as a stand-alone introduction to statistics, we recommend that it accompany a structured set of worked examples and activities. An excellent collection of support material is available on the accompanying website https://juta.co.za/support-material/detail/numbershypotheses-conclusions-3e.. The tutorial programme is aimed at a diverse group of disciplines: students of Psychology, Organisational Psychology, Sociology, Social Work, Anthropology, Education, and Political Studies – indeed, all students in the social sciences – will benefit from using this text. Whatever your area of study, we hope that the text will stimulate, guide, and promote you to the rank of inveterate inquirer. The skills we aim to impart are central to any knowledge-based enterprise. They are taught all over the world in programmes geared toward research, and thus provide a universal language for the social sciences. Indeed, they underpin successful theory and application in almost every field of enquiry. Expertise in quantitative methods is a critically important transferable skill, and many employers demand some level of competence in this area. Social science students differ in terms of their preparation for courses in quantitative methods. They come from a diversity of disciplines and backgrounds. This means that there will also be differences in their level of mathematical proficiency. We acknowledge this, and see it as a challenge. We have, therefore, included a substantial collection of revision material. The text’s emphasis is on statistical concepts and techniques. It promotes the use of simple mathematical manipulations and calculations to aid understanding. It is important for you to know how to do basic statistical calculations, and we encourage you to improve your skills no matter what your proficiency. There are many aids to assist with the error-prone activity of statistical computation, and the text demonstrates how to use calculators, spreadsheet programs, and statistical software packages correctly. The early tutorials emphasise calculator and spreadsheet work, while later tutorials assume the availability of a statistical package. The text provides material which demonstrates how to use the free statistical programming language, R, in addition to the well-established SPSS for particular statistical analyses. The text aims to enhance your learning experience by means of a wide variety of activities, interest boxes, application boxes, graphic material, and worked solutions to problems. You should use these to

NumHyp&Con_3e_Book.indb 10

2018/07/12 2:43:40 PM

PREFACE

xi

your advantage. We suggest that you keep a calculator or computer at hand and complete all the activities. Once you have finished studying the text, we recommend that you browse the website for additional worked examples and activities. Colin Tredoux and Kevin Durrheim, July 2018

NumHyp&Con_3e_Book.indb 11

2018/07/12 2:43:40 PM

Acknowledgements We would like to acknowledge the following people for their support and guidance during the process of writing and editing this book: • Diane Gascoigne, Aimée Tredoux, Cormac Tredoux, Aleks Durrheim, and Shea-Blue Durrheim. • Solani Ngobeni, Sandie Vahl, Fiona Wakelin, and Glenda Younge from the University of Cape Town Press for the first edition, and Jayde Butler, Edith Viljoen, and Carlyn Bartlett-Cronjé for the second and third editions. • Lance Lachenicht, Michelle Hoogenhout, Martin Terre Blanche, David Nunez, Gillian Finchilescu and Ingrid Palmary. • Mike Quayle and Nik Pautz.

Contributors • Kevin Durrheim, Department of Psychology, University of • • • • • • •

NumHyp&Con_3e_Book.indb 12

KwaZulu-Natal Gillian Finchilescu, Department of Psychology, University of the Witwatersrand Michelle Hoogenhout, Department of Psychology, University of Cape Town Lance Lachenicht, Department of Psychology, University of KwaZulu-Natal David Nunez, Microsoft, USA Ingrid Palmary, Department of Sociology, University of Johannesburg Martin Terre Blanche, Department of Psychology, University of South Africa Colin Tredoux, Department of Psychology, University of Cape Town

2018/07/12 2:43:40 PM

Glossary of symbols α β δ μ ρ σ Σ χ2 η2 φ2 σ2 φc

Alpha (Type I error rate) Beta (Type II error rate) Delta, a parameter used to determine power of a statistical test Mu, the population mean Rho, the population correlation coefficient Sigma (lower case), the population standard deviation Sigma (upper case), the arithmetic summation operator Chi-square statistic, or Chi-square distribution Eta-square, a measure of effect size Phi square, the mean square contingency coefficient Sigma square, the population variance Cramer’s V, a measure of effect size in contingency table analysis a The intercept coefficient in regression analysis b The slope coefficient in regression analysis D Difference between two scores d Effect size E Expected frequency F F distribution, or F ratio k Number of groups in a design MS Mean Square N Population size n Sample size O Observed frequency p Probability Q Tukey’s Q statistic (studentized range statistic) R Multiple regression coefficient r The Pearson product moment correlation coefficient 2 r Square of r; coefficient of determination R2 Square of the multiple regression coefficient; degree of linear model fit rs Spearman’s rank correlation coefficient s2 Sample variance s __2x  Standard error of the mean SS Sums of squares t The t statistic, used to test hypotheses about mean differences; also the t probability distribution  x  ¯   Sample mean y' y prime, the predicted score in a regression equation z Standard normal deviate; a score from a normal distribution standardized to have μ = 0, and σ = 1

NumHyp&Con_3e_Book.indb 13

2018/07/12 2:43:40 PM

NumHyp&Con_3e_Book.indb 14

2018/07/12 2:43:40 PM

Section 1 Statistics

NumHyp&Con_3e_Book.indb 1

2018/07/12 2:43:40 PM

Numbers, variables, and measurement

Tutorial

1

Colin Tredoux

After studying this tutorial, you should be able to: • list some of the key functions served by quantitative methods in the social sciences • distinguish probabilistic and deterministic forms of inductive reasoning • define a number of basic terms, including variable, statistic, and parameter • distinguish between descriptive and inferential statistical methods • identify some of the arguments against the use of quantitative methods. As you pick up this text and start to read it, you may be wondering how you managed to get yourself into this predicament. After all, many social science students choose the social sciences to escape the terrors and tribulations of mathematics and numbers. You now find that you are again faced with x and y, Σ and σ, and long strings of numbers. Why do you have to do this? Surely there is no point in trying to measure social phenomena? We all know the social world is inherently slippery, and defies exact representation. Surely this is a mistaken ambition? You are not alone in this point of view. A number of theorists and writers have put formidable reputations on the line in arguing that quantification and quantitative methods have no place in social science (Hornstein, 1988). Writing over 100 years ago, William James ridiculed the attempts by psychologists to quantify sensation and perception: To introspection, our feeling of pink is surely not a portion of our feeling of scarlet; nor does the light of an electric arc seem to contain that of a tallow candle in itself (cited in Hornstein, 1988, 45).

NumHyp&Con_3e_Book.indb 2

2018/07/12 2:43:40 PM

TUTORIAL 1: NUMBERS, VARIABLES, AND MEASUREMENT

3

If you sympathise with this point of view, the bad news is that it has lost the battle for sovereignty in the social sciences. Most social sciences make extensive use of quantitative methods, and students in these disciplines typically receive training in these methods from their undergraduate years all the way to doctoral level. A cursory flip through the current periodical holdings of any academic library will convince you of the important place these methods have. Of course, it may indeed be the case that quantification is misguided, and even non-rational (cf Hornstein, 1988). We cannot defend quantification against these charges, but we would like to persuade you here that there are palpable advantages to quantification. Quantitative methods provide powerful academic and intellectual possibilities, and to jettison them is akin to refusing to use electric lights because no-one has offered a satisfactory theory of electricity.

Most social sciences make extensive use of quantitative methods.

The advantages of quantitative methods What advantages do quantitative methods confer on us? There are a great many, which we will summarise as efficiency, approximation (or modelling) and a powerful language. Table 1.1 SA Census 2011: Correspondence of South African province of residence to province of birth (entries are proportions of people) Province/country of birth

Province where counted (and presumably where presently living) EC

FS

GP

KZN

LP

MP

NW

NC

WC

SA

EC

94.0

2.5

4.5

2.9

0.4

1.6

2.7

2.0

16.2

15.8

FS

0.4

87.3

3.2

0.4

0.3

1.2

2.9

1.9

0.8

6.5

GP

1.2

2.7

56.0

1.3

2.5

4.7

4.9

1.6

2.9

15.1

KZN

0.7

1.0

5.9

92.0

0.2

2.8

1.0

0.8

1.2

20.2

LP

0.1

0.6

10.8

0.2

90.9

4.2

2.8

0.3

0.3

12.8

MP

0.2

0.5

4.3

0.4

1.6

79.9

1.2

0.3

0.4

7.7

NW

0.1

1.1

3.5

0.2

0.6

0.8

78.3

3.7

0.3

5.9

NC

0.4

1.0

0.8

0.6

0.1

0.7

1.3

85.2

1.5

2.6

WC

1.7

0.8

1.5

0.3

0.4

0.4

0.5

2.5

71.9

8.9

OutsideSA

1.2

2.5

9.5

1.7

3.0

3.7

4.4

1.7

4.5

4.4

eg 0.94, or 94%, of people presently living in the Eastern Cape were born there. Note: EC =Eastern Cape, FS = Free State, GP = Gauteng, KZN = KwaZulu-Natal, LP = Limpopo, MP = Mpumalanga, NW = North West, NC = Northern Cape, WC = Western Cape

Efficiency Using numbers to communicate information is often extremely efficient. Every ten years or so, South Africa has a national census, in which information is collected about its inhabitants. Since there were

NumHyp&Con_3e_Book.indb 3

2018/07/12 2:43:40 PM

4

NUMBERS, HYPOTHESES AND CONCLUSIONS

approximately 50 000 000 inhabitants in 2011, you will appreciate the enormous amount of work and information that the numerical display in Table 1.1 summarises. (You may also notice how the data implicitly contradicts the notion that South Africa is being swamped by a tide of immigration from other African countries, as only 4.4% of the population in total was born outside South Africa). A non-quantitative approach would have struggled to represent the data in Table 1.1. Not the least of the concerns would have been adequate summary concepts or descriptors. In the case of quantitative research, on the other hand, there is a well-developed theory of summary indicators, and a well-developed technology to support these (eg computer software packages). Activity 1.1

Examine Table 1.1 carefully. Do you see any interesting patterns? Try to describe these without using any summary statistics (eg totals or averages), and without using any symbols that represent numbers (ie you can write ‘one’, but not ‘1’).

Approximation/modelling Quantitative techniques are often excellent at representing phenomena in the world, and in that respect they present us with wonderful opportunities for complex study of the phenomena. What dimensions do you think humans use for making similarity judgements of faces? Young

Fat

Thin

Old Figure 1.1 A spatial model for understanding human similarity judgements of faces

NumHyp&Con_3e_Book.indb 4

2018/07/12 2:43:48 PM

TUTORIAL 1: NUMBERS, VARIABLES, AND MEASUREMENT

5

Simply asking people how they make similarity judgements produces a variety of responses. However, a quantitative technique called multidimensional scaling provides a spatial model in which we can represent each dimension of similarity as an axis, and each face as a point in the intersection space of these axes. Figure 1.1 shows what a two-dimensional example of such a model might look like. This modelling allows us to infer what the important dimensions of similarity judgements of faces are. If we had to sort through a long set of verbal descriptions, it would take us a long time, and it is doubtful that we would arrive at the dimensions as clearly as we can with the quantitative technique in question.

A powerful language Perhaps the best thing about quantitative techniques is that there is already an established theory and practice. Mathematicians, statisticians, and social scientists have spent hundreds of years developing and refining a powerful quantitative language. When we use quantitative techniques we adopt this language, and save ourselves a few centuries of work. This language can make us competent in our interactions with the physical world. Imagine a game of dice on the street corner. Sipho is betting R10 that 5 will come up on the next throw of a single die. If it does, Jimi will pay him R30. Probability theory tells us that the chance of the 5 coming up on the next throw is one in six, and that the expected gain in this game for Sipho is –R3.33 ((1/6 × R30) – (5/6 × R10)) per roll of the die. The game of dice can be understood in terms of a ‘language of probability’, and this allows those who understand it considerable opportunity. Jimi will be a rich man if he continues to entice players like Sipho into the game. Consider another everyday situation where quantification is powerful. Activity 1.2 shows a weather forecast map. By looking at it briefly, you will be able to decide: • whether to take an umbrella to class • whether you should nail your roof down • whether you need to water your vegetable plants tomorrow.

Three advantages of quantitative methods: 1. Efficiency of communication 2. Modelling of realworld phenomena 3. A powerful and centuries-old language

All this information is effectively conveyed by numbers, and their position in the two-dimensional diagram. So, we have seen that quantitative methods are efficient, that they provide useful models of phenomena, and that they provide us with a powerful language. Is this enough to convince us that we should be using them in the social sciences? Perhaps not, but let us reflect on where we find quantitative methods in the world around us. Clearly, a number of professions depend on them: accountants, actuaries, and engineers make no secret of this, and, less obviously, architects and graphic designers. But there are many non-technical occupations that do so too. Consider the woman who owns the casino down the road. Her livelihood depends on the types of quantitative

NumHyp&Con_3e_Book.indb 5

2018/07/12 2:43:48 PM

6

NUMBERS, HYPOTHESES AND CONCLUSIONS

performance outlined in the example of the game of dice. What about carpenters? Carpentry depends, in a fundamental sense, on measurement and quantification, and carpenters use quantitative devices ranging from finely graded rules and set squares to sliding angle bevels. Activity 1.2

Inspect the following weather forecast, and see whether you can answer the questions below just from the numbers you see displayed on the map. Assume the forecast is for tomorrow. Justify your answers.

a) b) c) d)

Is it likely that you will need a jersey if you are in Durban at midday? Would it be a good idea to go kite-flying in Cape Town? Does the pattern shown here suggest summer or winter? If you live in Polokwane, should you take precautions against your water pipes bursting?

Now think more generally about your everyday life. You probably visit a shop of some kind every day. Shops are highly quantified, and your interaction with them is fundamentally of a quantitative kind: you pay some money to the shop, which has quantified the amount of profit it will make, the amount it will have to pay over to revenue services, and the amount it owes for store rental and salaries. In fact, we are completely embedded in a monetary economy, in which the house we live in, the food we eat, and perhaps even the thoughts that rush ceaselessly in our heads, have particular value. This monetary economy brings enormous flexibility to the social exchange that appears to be inherent in human societies. Still not convinced? Let us try one more argument. Many biologists and physiological psychologists argue that some kind of quantitative sense is native to the human species. In a book entitled What Counts: How Every Brain is Hard-wired for Math,

NumHyp&Con_3e_Book.indb 6

2018/07/12 2:43:48 PM

TUTORIAL 1: NUMBERS, VARIABLES, AND MEASUREMENT

7

Butterworth (1999) summarises evidence suggesting that the human brain has a ‘number module’ – a specialised circuitry that enables us to categorise objects in terms of numerosity. We recognise and distinguish objects in terms of numerosity (without being taught the meaning of number) in an automatic and involuntary way, just as we automatically and involuntarily see colours. In this way of looking at things, quantification and quantitative thinking is inescapable, and as at home in the social sciences as it is in your kitchen.

Functions of quantification There are a number of ways in which quantitative methods can function within an academic discipline. We saw some of these in the previous section, but it is useful to distinguish at a higher level of abstraction two general kinds of functions quantification can support. In the first place, quantification can serve an infrastructural or administrative function. For example, modern societies are embedded in a monetary economy, and much of the business of society has a structure within this economy. It is as if monetary quantification has built a vast set of roads, highways, ramps, and exits, and society moves along these roads, just as its members do on physical roads. While this infrastructural function is certainly important, it will not be our central focus in this text. Keep a diary or notebook with you for a day, and make a record of instances where you have to deal with numbers as part of your everyday life. Record as many instances as you can. Try to list for each instance what function the quantification serves.

Secondly, quantitative methods can function as evidentiary aids or systems. In other words, they can provide evidence for an argument, or against it. In addition, they frequently have deductive and inductive devices or mechanisms that can be used to draw conclusions and inferences. It is in this second sense that quantification is of most interest to us. For example, a key issue in health research around HIV is the transmission of HIV between mother and child from breastfeeding. Quantitative research tells us that the risk of transmission is high, and that anti-retroviral drugs may decrease this risk substantially. In order to draw this conclusion, researchers carefully quantified physiological measurements (eg T-helper cell responses), and used a research design that allowed them to use inferential statistical methods to determine whether infants administered anti-retroviral treatment showed lower rates of HIV infection than infants not administered anti-retroviral treatment. Although quantitative methods are often thought of as the tools of deterministic sciences, such as mechanical physics and chemistry,

NumHyp&Con_3e_Book.indb 7

Activity 1.3

Functions of quantification: 1. Administrative/ infrastructural (eg a monetary economy) 2. Aids to argument and reasoning

2018/07/12 2:43:48 PM

8

NUMBERS, HYPOTHESES AND CONCLUSIONS

Probabilistic methods form the basis for most quantitative inquiry in the social sciences.

a whole branch of mathematics is devoted to probabilistic methods. These methods form the basis for most quantitative inquiry in the social sciences. When we reason probabilistically, we make generalisations and draw conclusions that are supported by probability estimates, as opposed to the law-like statements and predictions we make in deterministic reasoning. For example, we say that we are 95% confident that the average income for social science graduates five years after graduation is between R135 000 and R155 000 per annum. We do not say that we are certain of this, but we express probabilistically defined confidence in it. On the other hand, if we are reasoning deterministically, we say things like ‘The force exerted by an object is the product of its mass and its acceleration’, and if we have precise estimates of the mass and acceleration, we make a precise prediction. Sometimes the probabilistic methods available to us can be used to create models that are so close to the phenomenon we wish to study that the move from model to phenomenon, to conclusion, is relatively effortless. Imagine that we are called on to evaluate a police line-up from which an identification has been used as evidence against an accused person. Two out of fifteen witnesses identified the accused person from a line-up that consisted of the suspect and five innocent police officers. One way of reasoning about the rate of identification is to treat the line-up as a die-tossing experiment: the die has six numbered sides (each number corresponds to a member of the parade), and the die is tossed 15 times (15 witnesses). We can use a well-known probability method (the binomial distribution) to calculate the probability that two of the fifteen witnesses who were guessing randomly could have identified the suspect. This is worth knowing, as a kind of baseline estimate of the information value of the identifications. The probability turns out to be approximately 0.47. In other words, there is about a one in two chance that two (or fewer) witnesses guessing randomly would have chosen the suspect. This is surely not good evidence against the suspect?

Activity 1.4

Read through some back copies of your local newspaper and try to find instances where numbers have been used to support an argument. Try to categorise the ways in which they have been used.

Much of the time, however, the quantitative methods we use do not directly fit the questions we study, and we rely on theorems to justify the application of the methods. When we evaluate the results of a psychotherapy programme, or an AIDS counselling programme, for example, we will often use the Central Limit Theorem, and a host of its derivatives, to decide whether the treatment is effective.

NumHyp&Con_3e_Book.indb 8

2018/07/12 2:43:48 PM

TUTORIAL 1: NUMBERS, VARIABLES, AND MEASUREMENT

9

Some basic concepts In order to prepare for material in later tutorials, it is useful to introduce some basic concepts.

Variables and constants The first step in using a quantitative language is to convert objects or entities in the real world into symbols and concepts of the language. Thus, when we measure something like height, we talk about height as a variable, and we typically symbolise it in some way, eg x. Since we will collect height measurements from a number of different people, we can expect these measurements to vary – to take on different values. For this reason, we call height a variable, and we use a subscript to identify instances of that variable. Thus, if we collect five measurements of height, the first score is x1, the second is x2, the third is x3, etc. Often the subscript is implicit, and we will write x = {1.9, 1.5, 1.7, 1.6, 1.8}, meaning x1 = 1.9, x2 = 1.5, etc. When we deal with a quantity that always has the same value, we refer to it as a constant, eg the speed of light.

Continuous vs discrete variables/measurements Many variables and constants are measured on continuous scales, which is to say that they can take any value in a defined range. Measurements of height and mass are obvious examples: given a sufficiently accurate scale, and considerable patience, you can measure out 30 grams of Beluga caviar per dinner guest, or 30.1 grams – or any amount again between these points. It is in this sense of covering all possible values within a defined range that the word ‘continuous’ is used. Discrete variables, on the other hand, can take only certain values. A variable that records the order in which athletes finish the 100-metre egg-and-spoon race, for instance, can only take the values 1, 2, 3, etc. It is not possible to finish in 3.25th place. Similarly, a variable that assigns 1 to people who have ‘survived’ lung cancer for five years after diagnosis, and 0 to people who have died within five years of diagnosis, excludes all other values. Discrete variables are also known as categorical variables. Decide whether each of the following measurements is a variable or a constant, and whether it is continuous or discrete: a) the time taken to complete a marathon b) the mass of the moon c) the troy ounce mass of a kilogram of gold d) the HIV status (+ or –) of an individual e) the number of judges in the Western Cape High Court.

NumHyp&Con_3e_Book.indb 9

Variables are measured entities (or attributes of entities) that can take on different values, eg height, mass.

Constants are quantities that do not change, but always have the same value, eg the speed of light.

Continuous measurements can take any value within the range defined as valid for a particular variable.

Discrete measurements can take only certain values within a range, eg 1, 2, 3, but not 1.5, 2.5.

Activity 1.5

2018/07/12 2:43:48 PM

10

NUMBERS, HYPOTHESES AND CONCLUSIONS

The difference between these two classes of variables or measurements is important. For our purposes, recognising whether a measurement is continuous or discrete will help us decide which kind of statistical test to use. When we collect data on continuous variables (eg a baby’s mass at birth, caloric consumption per day), we will use a set of techniques that exploit this continuous nature (eg t-tests, ANOVA), and when we collect data on discrete variables (eg votes for a political party, choice of spread for a sandwich), we will use a different set of techniques (eg χ2, Mann–Whitney). Tests for use on continuous data are usually not appropriate for categorical data, and vice versa. We distinguish between variables in terms of their scales of measurement.

NumHyp&Con_3e_Book.indb 10

Nominal, ordinal, interval, and ratio variables Another way of distinguishing between different kinds of variables is in terms of the mathematical properties of the numbers that the variables can assume. We can use the numbers 1 and 2 to represent people who receive (or do not receive) a medical treatment, to represent the individuals who came first and second in an examination, or to represent the actual marks the two weak students got in an examination. In each of these cases, the numbers have different mathematical properties. Mathematically, it is perfectly legitimate to subtract 1 from 2 to get 1 (2 – 1 = 1), but it is absurd to say that subtracting a person who did not receive a medical treatment (1) from someone who did receive a medical treatment (2) results in a person who did not receive a medical treatment (1). We say that the variables are measured on different scales of measurement. We usually distinguish between four scales (or levels) of measurement: nominal, ordinal, interval, and ratio. Nominal variables indicate only that there is a difference between categories of objects, persons, or characteristics. Numbers are used here as labels to distinguish one category from another. For example, numbers can be used as category labels to distinguish between different categories that make up the variables of five-year survival postcancer diagnosis (survived vs did not survive), religion (Protestant, Catholic, Jewish, Muslim), and psychopathology (schizophrenic, manic-depressive, neurotic). We can label survivors 1 and nonsurvivors 2, but it would make no difference if we labelled nonsurvivors 1 and survivors 2, or non-survivors 0 and survivors 1. All the numbers do is distinguish individuals in one group from individuals in another. No mathematical operations (+, –, ×, ÷) or mathematical relations (, greater than) may be performed with these numbers because the attributes they represent do not allow such operations. Although we can add or multiply 1 and 2, we cannot add or multiply the attributes Protestant and Catholic. Ordinal variables indicate categories that are both different from each other, and ranked or ordered in terms of an attribute. When we label developing countries ‘1’ and developed countries ‘2’, not only are we distinguishing between them, but we are also marking the fact that developed countries have more of the attribute ‘economic

2018/07/12 2:43:48 PM

TUTORIAL 1: NUMBERS, VARIABLES, AND MEASUREMENT

11

development’ than developing countries. The same holds true when we label university grades as A, B, C, D, or when we label opinions as strongly agree, agree, disagree, and strongly disagree. With ordinal measurements, we may perform mathematical relations (), but not mathematical operations (+, –, ×, ÷). Just because 2 = 2 × 1, we cannot say that developed countries (2) have twice as much economic development as developing countries (1). We can only say that they have more economic development. The differences between the amounts of an attribute that objects have do not correspond with the mathematical differences between the numbers that are used to represent these amounts. When the horses come in 1st, 2nd, and 3rd at the races, the numbers 1, 2, and 3 are measured on an ordinal scale, and do not tell us how far the second horse was behind the first horse (ie the distances between the horses). The intervals between the numbers on an ordinal scale are meaningless, and therefore no mathematical operations can be performed on these numbers. Interval variables are true quantitative measurements because, in addition to marking difference and rank, the differences or distances between any two numbers on the scale are meaningful. This means that the difference between two scores is an accurate reflection of the difference in the amount of an attribute that the two objects have. Temperature, measured in degrees Celsius, is measured on the interval scale, and a difference between 18 degrees and 20 degrees will be exactly the same as the difference between 25 degrees and 27 degrees. Most measurements in the behavioural sciences (eg IQ scores, scores on attitude scales, and knowledge tests) are interval measurements. In addition to performing mathematical relations (=, 90 category in the test scores example below) should be the number of observations (N). Apart from calculating cumulative frequencies for count data, we can also calculate cumulative frequencies for percentages. Similar to the cumulative frequency, the cumulative percentage frequency is the total of all percentages collected up to a certain point. The total cumulative percentage frequency will always be 100%. Table 2.3 shows the test scores of 80 learners. The same information is presented in a histogram and a cumulative frequency diagram below the table. The cumulative frequencies were calculated by adding the frequencies, starting from the smallest measurement group ( 90%). As no-one scored under 50%, the categories between 0% and 50% were combined into one. Table 2.3 Test scores (out of 100) in a class of 80 learners Category Range

Count

Midpoint

90

Activity 2.6

NumHyp&Con_3e_Book.indb 30

Describe the distribution of class scores. Do you think this was a good test? Why, or why not?

2018/07/12 2:43:51 PM

TUTORIAL 2: DISPLAYING DATA

80 Cumulative frequency (count)

25 20 Frequency (count)

31

15 10 5

60

40

20

0 50

90 60 70 80 Test score (max = 100)

60 70 80 Test score (max = 100)

Figure 2.10 Histogram (left) and cumulative frequency diagram (right) of the class test scores

How to construct a cumulative frequency diagram 1. As in the case of histograms, first divide the data into a sensible number of intervals. Count the number of observations within each interval. 2. To calculate the cumulative frequency, simply add up the frequencies sequentially, up to, and including, the specific category. For example, to calculate the cumulative frequency for the 55–59 range (see Table 2.3), add up the frequencies in the intervals < 50, 50–54 and 55–59. 3. If you would like to show a cumulative frequency percentage, you can divide each score by the total frequency (total number of observations) and multiply it by 100. 4. If drawing the figure by hand, choose scales for the axes that will accommodate all the data. Ensure you label the axes clearly.

Interpreting cumulative frequency diagrams Cumulative frequency diagrams do not indicate peaks, valleys or outliers in the data as clearly as histograms do. However, they are good at illustrating the proportion of people scoring either less than, or more than, a specific value. Cumulative frequencies are useful for answering questions such as ‘how many students failed the test?’ or ‘how many (or what percentage of) students scored less than 75% on the test?’ In this example, no students failed, and 79 students (98.75% of the class) attained less than 75%. What should you look for in a cumulative frequency diagram? A typical shape for a frequency diagram is a sigmoidal, or drawn-out ‘s’, shape: observations initially increase slowly, then increase rapidly in the middle, and, finally, the increase slows again until all the

NumHyp&Con_3e_Book.indb 31

2018/07/12 2:43:51 PM

32

NUMBERS, HYPOTHESES AND CONCLUSIONS

observations are accounted for (see the right hand panel in Figure 2.10). A steep slope in the cumulative frequency diagram means that there were many observations in the corresponding interval(s). A completely flat slope means that there were no observations in the corresponding interval(s). In the example above, only one student scored between 80 and 100% on the test, so the slope in this section is nearly flat. In contrast, many students scored between 60% and 65%. Activity 2.7

1. Consult Figure 2.10. Approximately how many students had a score of 60 or less? How many students had a score of 80 or less? 2. Complete the missing information in the table:

Interval

Frequency

1–10 11–20 21–30 31–40

2 7 6 4

Cumulative frequency

% Frequency

Cumulative % frequency

Hint: Use the total number of observations to work out percentages.

Line graphs

Memory performance (max = 10)

Instead of representing numbers using bars, we can represent them as a series of dots connected by a line, or just as a line. For example, the graph below compares memory performance in two groups. The experimental group received a supplement that claims to enhance memory. The control group received a placebo. We tested each group’s memory before and after starting the experiment. What observations can you make about the difference in memory performance between the groups?

8 7 Supplement Placebo

6 5 Before

After

Figure 2.11 Memory performance after receiving a memory enhancer or a placebo After taking the supplement, participants in the experimental group performed better on a memory task than participants in the control group.

NumHyp&Con_3e_Book.indb 32

2018/07/12 2:43:51 PM

TUTORIAL 2: DISPLAYING DATA

Life expectancy at birth (years)

Life expectancy at birth (years)

We often use bar graphs for comparing categorical variables and line graphs for comparing continuous variables. However, this is only a guideline. It is useful to plot data in different ways to see what works best. For example, when we want to show changes over time, it may be more useful to display the data as a series of dots connected by a line rather than a series of bars. Compare the following two figures:

33

A line graph shows the relationship between variables by connecting adjacent points with a line.

70 AIDS spreads

Free ARVs Country

60

Botswana South Africa Uganda

50 40 1970

1980

1990 Year

2000

2010

65 60

Country

Botswana South Africa Uganda

55 50 45 1960

AIDS spreads 1980

Free ARVs Year

2000

Figure 2.12 Average life expectancy in three different sub-Saharan African countries Life expectancy in sub-Saharan Africa rose steadily in the middle of the 20th century until the start of the HIV/AIDS outbreak in the mid-1970s to mid-1980s. As the disease spread through Africa, life expectancy first began to plateau in Uganda (1970), and later in Botswana (1980). Life expectancy in all three countries then dropped dramatically until the roll-out of free public antiretroviral (ARV) therapy during the early 2000s. Life expectancy is now steadily increasing once again. Data courtesy of the World Bank DataBank (databank.worldbank.org).

The line graph shows the change in life expectancy over time more clearly than the bar graph. It is easier for our eyes to follow the increases and decreases of the lines – for a start, the movement on the graph is also from left to right, mirroring the direction of our eye movements when reading. Table 2.4 shows an extract of the life expectancy data from South Africa that was used to make Figure 2.12. This is also how your data should be organised to create a line graph with two variables. If you want to add a third variable (such as Country), it needs a separate column – as shown in Table 2.5.

NumHyp&Con_3e_Book.indb 33

2018/07/12 2:43:52 PM

34

NUMBERS, HYPOTHESES AND CONCLUSIONS

Table 2.4 Life expectancy in South Africa

Table 2.5 Life expectancy in three different countries

Year

Life expectancy

Country

Year

Life expectancy

1960

49

Botswana

1960

51

1970

53

Botswana

1970

54

1980

57

Botswana

1980

61

1990

62

…

2000

56

South Africa

1960

49

2001

55

South Africa

1970

53

2002

53

South Africa

1980

57

2003

53

…

2004

52

Uganda

1960

44

2005

52

Uganda

1970

49

Uganda

1980

49

…

…

How to construct line graphs 1. Organise your dataset, or create a table, so that the independent and dependent variables are in two columns. Each row should contain a pair of x- and y-values. For example, Table 2.4 shows the independent variable, Year, in the leftmost column and the dependent variable, Life expectancy, in the rightmost column. 2. The dependent variable is normally plotted on the y-axis and the independent variable on the x-axis. 3. If you are drawing the graph by hand, decide on the scale of the axes. All the points need to fit on the graph without unnecessary white spaces. You should, therefore, choose the maximum value for each axis so that it is only slightly more than the maximum value in the data. In Figure 2.12, the maximum life expectancy is 64 years. The maximum value of the y-axis is 65. If you are using a computer, the program will do this for you. 4. Plot your data on the graph. If you are drawing the graph by hand, find the correct place on the x and y-axes for the first data pair. Find the place where a vertical line (drawn from the point on the x-axis) and a horizontal line (drawn from the point on the y-axis) intersect. Mark the point of intersection with a dot. Figure 2.13 shows the intersection of the points x = 5 and y = 10. Repeat this step for all the other data points. Once all of them are plotted, connect the dots with a line. A computer program will automatically plot the line between all the data points when given the two-column data set.

NumHyp&Con_3e_Book.indb 34

2018/07/12 2:43:52 PM

TUTORIAL 2: DISPLAYING DATA

35

Dependent variable (y)

15

(x = 5, y = 10)

10

5

0 0

2

4 Independent variable (x)

6

8

Figure 2.13 How to connect x and y coordinates to form a line

5. If you are comparing multiple items, create a legend to identify what the different line colours or shapes represent. For example, Figure 2.12 compares three different countries. The legend on the right shows which line belongs to which country. 6. Label your axes. Give both the measurement type and unit – for example, ‘life expectancy (years)’.

Interpreting line graphs Plotting the relationship of one variable against another can tell us whether there is a relationship between the two variables. It can also tell us what that relationship is. Ask yourself: how can I describe the pattern in the graph? Is this what I would expect? What else could explain the pattern? Look at the slope of the line for information about the relationship. The slope of a line refers to its direction and steepness. In mathematical terms, change in y change in x

slope = _________       The steepness of the line shows the strength of the relationship between the variables. The steeper the slope, the stronger the relationship. The direction of the line shows the type of relationship between the variables. A line going up from left to right (as in Figure 2.13) indicates a positive relationship: y increases with x. A line going down from left to right indicates a negative relationship: y decreases as x increases. If you are unsure about how to interpret or calculate the slope of a graph, see Tutorial 24: Reading and understanding graphs. To recap, the direction and steepness of the lines in a line graph show the relationship between variables. What line graphs cannot

NumHyp&Con_3e_Book.indb 35

The direction and steepness of the line provide information about the relationship between the variables.

2018/07/12 2:43:52 PM

36

NUMBERS, HYPOTHESES AND CONCLUSIONS

show is how the relationship would change if another variable was added, or if you looked at subgroups of the data (eg splitting the data into different countries or by sex). You should question whether there are other variables likely to affect the relationship. What happens if you graph the data by individual subgroups? Will the pattern change? Activity 2.8

Consider the two graphs which follow: A

B 70

15

10

5

Recognition accuracy (%)

Number of errors made

20

60

Female

50

40 Male

0 0 1 2 3 4 Number of days without sleep

30 Calm Stressed Condition

a) What are the independent and dependent variables in each? b) Describe the relationships illustrated in each.

Box 2.2

Graphics in R R provides a good means of producing beautiful, customisable graphs. There are three main ways to create graphs in R. You can either use the base R plotting functions that come with the R core package, or use the additional packages ‘lattice’ or ‘ggplot2’. The latter two can be installed with the commands install.packages(“lattice”), and install.packages(“ggplot2”). Which graphics system you use depends on your preference, but both ggplot2 and lattice are better than base R at creating plots of subgroups of the data within one figure. This section offers only a basic introduction to R base graphics, but there are many excellent books and websites on creating graphs in R using other packages. See the Further reading section at the end of this tutorial for more information.

Bar graphs Using R base graphics, you can make a simple bar graph using the command barplot(data$y, names.arg = x), where ‘data’ is the name of your dataset, and ‘x’ and ‘y’ specify your variable names.

NumHyp&Con_3e_Book.indb 36

2018/07/12 2:43:52 PM

TUTORIAL 2: DISPLAYING DATA

For example, the data below represents the number of pupils per mathematics textbook in different sub-Saharan African countries. The first three lines create the data. The next two create a basic bar graph.

37

Box 2.2

#create dataset (Source: World Bank DataBank, 2012) country