Quantile Regression in Clinical Research: Complete analysis for data at a loss of homogeneity 3030828395, 9783030828394

220 77 22MB

English Pages 282 [283] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Quantile Regression in Clinical Research: Complete analysis for data at a loss of homogeneity
 3030828395, 9783030828394

Citation preview

Ton J. Cleophas Aeilko H. Zwinderman

Quantile Regression in Clinical Research Complete analysis for data at a loss of homogeneity

Quantile Regression in Clinical Research

Ton J. Cleophas • Aeilko H. Zwinderman

Quantile Regression in Clinical Research Complete analysis for data at a loss of homogeneity

Ton J. Cleophas Albert Schweitzer Hospital Department Medicine SLIEDRECHT, Zuid-Holland, The Netherlands

Aeilko H. Zwinderman Dept. Biostatistics and Epidemiology Academic Medical Center Amsterdam, The Netherlands

ISBN 978-3-030-82839-4 ISBN 978-3-030-82840-0 https://doi.org/10.1007/978-3-030-82840-0

(eBook)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Except for Wiley’s edition of Quantile Regression: Estimation and Simulation, vol 1 (2017) and vol 2 (2020), by the Italian econometrists from Naples University, Furno and Vistocci, no textbooks have been published that address clinical research. Quantile regression is an approach to data at a loss of homogeneity, for example: 1. 2. 3. 4.

Data with outliers Skewed data like corona-deaths data Data with inconstant variability Big data

In clinical research, many examples can be given like circadian phenomena, and the spread of diseases may be dependent on subsets with frailty, low weight, poor hygiene, and many forms of lack of healthiness. Stratified analysis is a laborious and rather explorative way of analysis, but quantile analysis may be a more fruitful, quicker, and more complete alternative for the purpose. Considering all of this, we are on the verge of a revolution in data analysis that has begun with the tentative acceptance of multiple predictor variables in prospective randomized research and is now with “Koenker’s Quantile regression R package version 5.05 2013”and “le Cook’s Thinking beyond the mean” (Sjanghai Arch Psychiatr 2013, pp. 55–59) definitive. The current edition is the first textbook and tutorial of quantile regressions for medical and healthcare students as well as recollection/update bench and help desk for professionals. Each chapter can be studied as a standalone and covers one of the many fields in the fast-growing world of quantile regressions. Step-by-step analyses of over 20 data files stored at extras.springer.com are included for self-assessment. We should add that the authors are well qualified in their field. Professor Zwinderman is past president of the International Society of Biostatistics (2012– 2015) and Professor Cleophas is past president of the American College of Angiology (2000–2002). From their expertise, they should be able to make adequate selections of modern quantile regression methods for the benefit of physicians,

v

vi

Preface

students, and investigators. The authors have been working and publishing together for 22 years, and their research can be characterized as a continued effort to demonstrate that clinical data analysis is not mathematics but rather a discipline at the interface of biology and mathematics. Sliedrecht, Zuid-Holland, The Netherlands Amsterdam, The Netherlands

Ton J. Cleophas Aeilko H. Zwinderman

Contents

1

2

General Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Principle of Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . 1.4 Principle of Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . 1.5 History and Background of Quantile Regression . . . . . . . . . . . . 1.6 Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Separating Quantiles, Traditional and Quantile-wise . . . . . . . . . 1.8 Special Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Quantile Regression Both for Continuous and Discrete Outcome Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Models for Separating Quantiles from One Another . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Maximizing Linear Functions with the Help of Support Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Lagrangian Multiplier Method . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Maximizing Linear Functions with the Help of Rectangles . . . . 2.6 Maximizing Linear Functions with the Help of Simplex Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 The Intuition of Quantile Regression . . . . . . . . . . . . . . . . . . . 2.8 Special Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Traditional Statistical Methods Applied in This Edition . . . . . . 2.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 3 4 5 6 6 7 8 8

. . .

9 9 9

. . .

10 11 12

. . . . . .

13 15 16 17 18 18

vii

viii

Contents

Part I

Simple Univariate Regressions Versus Quantile

3

Traditional and Robust Regressions Versus Quantile . . . . . . . . . . 3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Traditional and Robust Regression . . . . . . . . . . . . . . . . . . . . . 3.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

23 23 23 25 27 34 34

4

Autocorrelations Versus Quantile Regressions . . . . . . . . . . . . . . . 4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Autoregression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

35 35 35 36 39 43 44

5

Discrete Trend Testing Versus Quantile Regression . . . . . . . . . . . 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Discrete Trend Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

45 45 46 46 48 49 50

6

Continuous Trend Testing Versus Quantile Regression . . . . . . . . . 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Linear Trend Testing of Continuous Data . . . . . . . . . . . . . . . . 6.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

51 51 51 52 53 57 57

7

Binary Poisson/Negative Binomial Regressions Versus Quantile . . 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Binary Poisson and Negative Binomial Regressions . . . . . . . . 7.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

59 59 59 60 65 66 66

8

Robust Standard Errors Regressions Versus Quantile . . . . . . . . . 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Robust Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

69 69 69 70 72

Contents

8.5 8.6

ix

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73 73

9

Optimal Scaling Versus Quantile Regression . . . . . . . . . . . . . . . . . 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Optimal Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

75 75 75 76 77 81 81

10

Intercept only Poisson Regression Versus Quantile . . . . . . . . . . . . 10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Poisson Intercept Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

83 83 84 84 85 86 87

Part II

Multiple Variables Regressions Versus Quantile

11

Four Predictors Regressions Versus Quantile . . . . . . . . . . . . . . . . 11.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Four Predictors Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 91 . 91 . 92 . 93 . 94 . 100 . 100

12

Gene Expressions Regressions, Traditional Versus Quantile . . . . . 12.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Gene Expressions Regression . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

103 103 103 104 105 110 111

13

Koenker’s Multiple Variables Analysis with Quantile Modeling . . 13.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 SAS Statistical Software Graphs Interpreted . . . . . . . . . . . . . . 13.4 First Four Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 The Second Set of Four Graphs . . . . . . . . . . . . . . . . . . . . . . . 13.6 The Third Set of Four Graphs . . . . . . . . . . . . . . . . . . . . . . . . 13.7 The Fourth Set of Four Graphs . . . . . . . . . . . . . . . . . . . . . . . 13.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

113 113 113 114 115 116 117 118 119 120

x

Contents

14

Interaction Adjusted Regression Versus Quantile . . . . . . . . . . . . . 14.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Interaction Adjusted Regression . . . . . . . . . . . . . . . . . . . . . . . 14.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

121 121 122 122 123 130 130

15

Quantile Regression to Study Corona Deaths . . . . . . . . . . . . . . . . 15.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Methods and Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

131 131 132 132 134 134

16

Laboratory Values Predict Survival Sepsis, Traditional Regression Versus Quantile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Traditional Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

137 137 138 138 140 148 149

17

Multinomial Regression Versus Quantile . . . . . . . . . . . . . . . . . . . . . 17.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 Multinomial Regressions and More . . . . . . . . . . . . . . . . . . . . . 17.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

151 151 152 153 161 163 163

18

Regressions with Inconstant Variability, Traditional and Weighted Least Squares Analysis Versus Quantile . . . . . . . . . 18.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Regressions with Inconstant Variability . . . . . . . . . . . . . . . . . 18.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

165 165 165 167 169 173 173

19

Restructuring Categories into Multiple Binary Variables Versus Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Traditional Multiple Regression After Restructuring Predictive Categories into Multiple Binary Variables . . . . . . . .

. . . . . . .

. 175 . 175 . 176 . 176

Contents

19.4 19.5 19.6 20

xi

Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Poisson Events per Person per Period of Time Versus Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 3. Poisson Events per Person per Period of Time . . . . . . . . . . . 20.4 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part III

. . . . . . .

183 183 184 184 187 189 190

Special Regressions Versus Quantile

21

Two Stage Least Squares Analysis Versus Quantile . . . . . . . . . . . 21.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Two Stage Least Squares (SLS) . . . . . . . . . . . . . . . . . . . . . . . 21.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

193 193 194 195 197 203 204

22

Partial Correlations Versus Quantile Regressions . . . . . . . . . . . . . 22.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Partial Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

205 205 206 206 210 216 217

23

Random Intercepts Regression Versus Quantile . . . . . . . . . . . . . . . 23.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.3 Random Intercept Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 23.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.5 Quantile Regression with Intercept Included . . . . . . . . . . . . . . . 23.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

219 219 220 221 224 226 227 228

24

Regression Trees Versus Quantile Regression . . . . . . . . . . . . . . . . . 24.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3 Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

229 229 230 230 235 239 239

xii

Contents

25

Kernel Regression Versus Quantile Regression . . . . . . . . . . . . . . . 25.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3 Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

241 241 242 242 247 254 255

26

Quasi-likelihood Regressions vs Quantile . . . . . . . . . . . . . . . . . . . 26.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.3 Quasi-likelihood Regressions . . . . . . . . . . . . . . . . . . . . . . . . . 26.4 Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

257 257 258 258 267 269 270

27

Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Chapter 1

General Introduction

1.1

Summary

Quantile regressions have been invented centuries ago but have been virtually unused due to computational complexity. Thanks to computers and modern statistical software computations have become easy today. The principle of regression is, that the best fit mathematical model is chosen for a dataset, and, then, it is tested how far distant from this model the data are. Quantile regression is like traditional linear regression, but instead of using a regression coefficient based on means regression coefficient ¼ Σ ðx  xmean Þðy  ymeaan Þ=ðx  xmean Þ2 it uses regression coefficients based on medians (0,5 quantiles ¼ 50%s) regression coefficient ¼ Σ ðx  xmedian Þðy  ymedian Þ=ðx  xmedian Þ2 : We should add that other quantiles for example 10% or 90% etc. are possible. Quantile regression was invented by Boskovicz, a Croatian priest. It is an extension of traditional linear regression, and it is used, when you make no assumptions about the data distribution. It is used in ecology research, risk management, econometry, and more. The mortality/morbidity data of the pandemic corona virus infections are heavily skewed to the elderly, and would be an excellent example of data suitable for quantile regression. The release of the quantile regression module in the 26th version of SPSS statistical software in first half of 2020, at the same time a worldwide corona epidemic started to take place, is not just a coincidence. With quantile analysis, prior to any data-analysis, the quantiles, otherwise called percentages, of a linear dataset must be divided into an upper and lower part of the data separated by a regression line. With traditional linear regression this separation line is the best fit line for summarizing the data, and it is computed by the ordinary least squares (OLS) method. The random error needed for assessment of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. J. Cleophas, A. H. Zwinderman, Quantile Regression in Clinical Research, https://doi.org/10.1007/978-3-030-82840-0_1

1

2

1 General Introduction

statistical significance of the model (significantly different from a zero effect) is, however, computed very differently. Traditionally, the standard error should look like:  Standard error ¼ √ Σy2  aΣy  bΣxy =ðn  2Þ With quantile regression things look more easy: Mean absolute deviation ¼ ½Σ ðy  xÞ=n: There is no squared root anymore. So everything looks very simple. However, the y and x values must be weighted by check functions including (1) a check function for the asymmetric weight to the error depending on the quantiles (often called ρ), (2) the overall sign of the error, and (3) a check function for the magnitudes of the regression coefficients depending on the quantiles (often called τ). With multiple predictors in the quantile regression the equation looks like shown below. Mean absolute deviation ¼ 1=n ½Σ ρ ðy  aτ þ b1 x1 τ þ b2 x2 τ þ . . . . . . . . .Þ Quantiles can be obtained multiple ways, for example, from horizontal parallel lines. This method, does, however, not take into account, that each value of the dataset is assessed for having 90 or 10 % chance of being above or below the cut-off line. With quantile regressions special measures must be taken to ensure that, indeed, each value is accounted. The Chap. 2 will address this subject in particular.

1.2

Introduction

Today quantile regression is a mathematically validated method, but, in practice, virtually unused due to numerical complexity. Fortunately, thanks to the availability of computers, and, recently, improved analytic mathematics (see the Chap. 2) computations have become easy. It is now available in most of the major statistical software packages like SAS, SPSS, R, Matlab, Stata, Python, Mathematical Package. But to date case by case data analyses against traditional analytic methodologies are missing, and, particularly, in clinical research the methodology is still virtually unused. And this is so, in spite of occasionally astonishing results in practice, like, for example, the recent statistical identification of a series of predictor markers of the huge numbers of corona virus deaths in a worldwide study from Connecticut University (see the Chap. 15).

1.3 Principle of Regression Analysis

1.3

3

Principle of Regression Analysis

We will address the general principle of regression analysis first with the help of a hypothesized data example.

The above figure shows the overall effect of age on income in the left upper graph. Also the effects of several subgroup characteristics are given in other graphs. The ten regression lines in the 6 graphs have different steepnesses. The overall regression line is the best fit line drawn in the left upper graph, i.e., the regression line with the least distances to the individual values. No better line is possible. Mostly, around 50% of the data are above and 50% are below this line. However, as we shall soon see, instead of 50% also 40% of the data above and 60% below, or 30%, or 20% etc can also be chosen, and the underneath range of lines develops.

Each line can subsequently be tested for data fit. The flat lines look a bit like the above low education line and the females line. The steep lines look like the high

4

1 General Introduction

education and males lines. For example, if many low education subjects are in the data, then the above flat lines will better fit the data, if many high education subjects are so, then the above steeper lines will better fit the data. If you don’t know, whether regression line (1) will better fit than, for example, line (6), you may consider testing all of the 6 lines, and the line with the best statistics, i.e. the line to which all of the data are closest, and, at all events, closer to than could happen by chance, will be the best choice for describing the pattern of the data and for making predictions from it. The above range of lines is never done with traditional regression and always done with quantile regression, although in a more sophisticated way. Traditional regression analysis, however, calculates only one best fit “line/exponential curve/curvilinear curve/etc.”, i.e., the one with the shortest distance to the data. And, then, it tests, how far distant from the curve the data are. A significant correlation between the yand x- data means, that the outcome data (the y-data) are closer to the curve (the model) than could have happened with random sampling (by chance). How do we test. Pretty basic statistical tests can be adequately used for the purpose, like t-tests or analyses of variance. We should add, that the model-principle as applied is at the same time the largest limitation of this analytic approach: it is often no use forcing nature into some kind of mathematical model. Nonetheless, with plenty of regression models, it is exciting to search for the best fitting ones. This edition will address quantile regression as compared to alternative models. Quantile regression is a so far little used but promising method. It is just like ordinary least square regression a multiple variables regression model producing multiple t-values. These t-values are not usually adjusted for multiple testing, because they are assumed to stem from a family of null-hypotheses with many interactions and within a single experiment. And, therefore, adjustment, as though they were entirely independent null-hypotheses, does not seem right. Here, p-values are not hypotheses that test an independent null-hypothesis, but they rather are kinds of goodness of fit tests, something that makes the investigator happy. Finally, we should add, that, with the quantile regressions, sometimes linear analyses are replaced with log or log odds transformed linear analyses. This is so, because a better fit of the models can be obtained by doing so.

1.4

Principle of Quantile Regression

Quantile regression does model the relationship between a set of predictor (independent) variables and a target (outcome) variable in the form of percentiles (otherwise called quantiles), instead of the traditional means. Most often the median, otherwise called the 0.5 quantile, is applied. The median is the midpoint of a frequency distribution. If the frequency distribution is normal (Gaussian, also called parametric), the median will equal the mean value of the frequency distribution. If skewed, it will be computed as the value in the middle. For example,

1.5 History and Background of Quantile Regression

5

(1) the median of the values 1, 3, 3, 6, 7, 8, 9 will be the value 6, (2) the median of the values 1, 2, 3, 4, 5, 6, 8, 9 will be the value 4.5, it is the midpoint between 4 and 5. Both with the examples (1) and (2), the median is slightly different from the mean, mean ð1Þ ¼ Σ values=n values ¼ 37=7 ¼ 5:29 instead of 6:00, mean ð2Þ ¼ Σ values=n values ¼ 38=8 ¼ 4:75 instead of 4:5: A quantile literally means a fraction of data, and is virtually identical to a percentile (percentage of the data). The median is the 0.5 quantile, but other quantiles are possible, for example the 0.1, 0.2, 0.3, 0.4 quantile etc. Quantile regression will usually apply the 0.5 quantile, instead of the mean of the data, but a better data-fit will be sometimes obtained with other quantiles.

1.5

History and Background of Quantile Regression

The idea of estimating a median, instead of a mean regression slope, was proposed already in 1760 by Boskovicz, a Jesuit priest of Dubrovnik (Croatia), building on astrological theories of Newton (Cambridge UK), and he was the first to use the term least absolute criterion, a precedent of the least squares model, as invented by Legendre (Paris France) in 1805. Also Laplace (Paris France) participated in the development of medians and the regression with medians. Median regression computations for larger data sets are quite tedious, as compared to the least squares method, for which reason it has historically generated little popularity among statisticians, until the widespread adoption of computers in the last part of the twentieth century. Another important contributor to quantile regression has been Roger Koenker (economist, University of Illinois 2001). Quantile regression is a type of regression analysis, that is used, when you want to estimate the conditional median of the target (dependent) variable. Essentially, quantile regression is an extension of linear regression, and it is used, when you make no assumptions about the distribution of the residuals. Quantile regression helps you to obtain a more comprehensive analysis of the relationship between predictor and outcome variables. It can be used in ecology research, healthcare, risk management, and more. Quantile regression is included as a ground braking feature of the 2020 version of SPSS statistical software, version 26. Particularly, big data models like the recent Covid-19 population data are skewed (Why corona-virus research data are skewed, www.bloomberg.com, May 1, 2020, and innumerable other papers on the Internet), and may benefit from assumption-less models like quantile regressions, but have so far been rarely used for the purpose. This edition will demonstrate, with the help of small data examples, how quantile regression are of excellent analytic support as compared to traditional and robust linear regression

6

1 General Introduction

analysis. The mortality/morbidity of corona virus infections is heavily skewed to the elderly, and the recent release of the quantile regression module in the 26th version of SPSS statistical software, in the period that a worldwide corona epidemic started to take place, conspicuously, and, at the same time, fortunately coincided.

1.6

Data Example

As an example of quantile analyses of corona data, in a explorative study (J Health Res, October 2020) of Corona-mortality-data from 184 countries of 8 predictor variables none of the variables were statistically significant in the ordinary least square analysis. In contrast, quantile analysis revealed significant results in the 0,1 (p < 0.05), 0,5 (p < 0.10, 0.05, and 0.01), 0,75 (p < 0.01), and 0,9 ( (p < 0.10, 0.05, and 0.01) quantiles. Significant mortality predictors were (1) obesity, (2) age over 65, (3) urbanization, and (4) capita income. More details of the excellent performance of quantile regression in this data example will be given in the Chap. 15.

1.7

Separating Quantiles, Traditional and Quantile-wise

Prior to any data-analysis the quantiles, otherwise called percentages, of a linear dataset must be divided into an upper and lower part of the data separated by a regression line. With traditional linear regression this separation line is the best fit line for summarizing the data, and it is computed by the ordinary least squares method. In a two-D graph its slope, otherwise called regression coefficient, is calculated according to ½Sum of Products xy  values=Sum of Squares x  values, the traditional linear regression mathematical model is given by the equation y ¼ a + bx. With quantile regressions things are slightly different. The equation is given by: }

quantile y ¼ quantile intercept þ quantile b:x} :

The random error needed for assessment of the statistical significance of the model (significantly different from a zero effect) is computed very differently. Traditionally, the standard error should look like:  Standard error ¼ √ Σy2  aΣy  bΣxy =ðn  2Þ With quantile regression it looks more easy:

1.8 Special Case

7

Mean absolute deviation ¼ ½Σ ðy  xÞ=n: There is no squared root anymore. So everything looks very simple now, but the y and x values must be weighted by check functions including a check function for the asymmetric weight to the error depending on the quantiles (often called ρ) and the overall sign of the error, and a check function for the magnitudes of the regression coefficients depending on the quantiles (often called τ). With multiple predictors in the quantile regression the equation should look like shown below. Mean absolute deviation ¼ 1=n ½Σ ρ ðy  aτ þ b1 x1 τ þ b2 x2 τ þ . . . . . . . . .Þ

1.8

Special Case

Quantiles can be obtained multiple ways, for example, from horizontal parallel lines. This method, as shown in graph underneath, does, however, not take into account, that each value of the dataset is assessed for having 90 or 10% chance of being above or below the cut-off line.

In contrast, the underneath quantile regression lines are obtained in such a way, that each value in the dataset has equal chance of (approximately) 90 or 10% being present above or below the cut-off line.

8

1 General Introduction

When talking of the quantiles as applied with quantile regressions, we always refer to the latter types of regression lines.

1.9

Quantile Regression Both for Continuous and Discrete Outcome Variables

Although, traditionally, only for continuous outcome variables, recently quantile regressions have increasingly been applied for discrete outcome variables, for example for “well being” outcome scores. Sceptics reason, that, mathematically, this is not entirely appropriate, because discrete regression coefficients are (asymptotically) normally distributed, while discrete outcomes are rather stepwise. Fortunately, strange regression coefficients, mostly very small ones, and p-values of 1.000 are accompanied by a host of reasonable regression coefficients. This outweighs the relative theoretical disadvantage, and enables the practical use of explorative quantile regressions in many situations where alternatives are scarce.

1.10

References

All of the chapters of the current edition start with a brief review of the traditional analytic method of the different regression methods prior to the review of the relevant quantile regression method. For the purpose, generally, data examples are used from the recent edition “Regression Analyses in Clinical Research for Starters and 2nd Levelers 2nd Edition, Springer Heidelberg Germany 2021”, by the same authors. For a better understanding of differences between traditional and quantile regressions, readers may benefit from the study of this edition first. To readers requesting still more background, theoretical and mathematical information of computations given, several textbooks complementary to the current production and written by the same authors are available: Statistics applied to clinical studies 5th edition, 2012, Machine learning in medicine a complete overview 2nd edition, 2020, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical data analysis on a pocket calculator 2nd edition, 2016, Understanding clinical data analysis from published research, 2016, all of them edited by Springer Heidelberg Germany.

Chapter 2

Mathematical Models for Separating Quantiles from One Another

2.1

Summary

Finding the best fit separation line in a 2 dimensional random sample of observations between two subsets, such that, for example, 10% of the observations is left and 90% is on the right side is the first objective of quantile analysis, and this task is not a particularly easy one. In this chapter several methods will be reviewed, and we will observe, that maximizing linear functions are most adequate for the purpose. Four methods are frequently used. A support vector machine (SVM) is the name for a specific supervised learning model and its associated computerized learning algorithm. It is used for pattern recognition and can be applied both for classification and for regression analysis. A Lagrangian hyperplane is the best fit separation line between two regression datasets chosen in such a way that the distance to the data is minimal. In an orthogonal function this minimal distance is the square root of the sum of squares of the coordinates. Instead of quadratic programming, rectangles and simplex algorithms are possible. A rectangle and a simplex algorithm example are also described in this chapter. We should add, that this chapter and to a lesser degree the previous chapter may be a bit beyond the mathematical level of non-mathematicians who are the target readership of this edition. These chapters may, however, be skipped, because for those wishing to apply and understand quantile regression, enough practical information is left in the remaining chapters.

2.2

Introduction

Finding the best fit separation line in a 2 dimensional random sample of observations between two subsets, such that, for example, 10% of the observations is left and 90% is on the right side is the first objective of quantile analysis, and this task is not a © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. J. Cleophas, A. H. Zwinderman, Quantile Regression in Clinical Research, https://doi.org/10.1007/978-3-030-82840-0_2

9

10

2 Mathematical Models for Separating Quantiles from One Another

particularly easy one. In this chapter several methods will be reviewed, and we will observe, that maximizing linear functions are most adequate for the purpose. With the help of for example simplex algorithms the best fit separation function line for the data will be found through a maximized function. With quantile regressions special measures must be taken to ensure that, indeed, each value is accounted. This chapter will address methods for the purpose: 1. 2. 3. 4.

Support vectors method. Lagrangian multiplier method. Rectangles method. Simplex algorithms.

We should add, that this chapter and to a lesser degree the previous chapter may be a bit beyond the mathematical level of non-mathematicians which are after all the target readership of this edition. These chapters may, however, be skipped, because for those wishing to apply and understand quantile regression, enough practical information is in the remaining chapters.

2.3

Maximizing Linear Functions with the Help of Support Vectors

A support vector machine (SVM) is the name for a specific supervised learning model and its associated computerized learning algorithm. It is used for pattern recognition and can be applied both for classification and for regression analysis.

A

B

patient characteristic 2: X2

patient characteristic 2: X2

d

V3 patient characteristic 1: X1

V1 patient characteristic 1: X1

V2

The basic aim of SVMs is to construct a hyperplane formed by the set of patient characteristics that separates the cases and controls as good as possible. For two dimensional data this hyperplane is the best fit separation line, for three dimensional data the best fit separation plane. Consider the above bivariate scatterplot of two patient-characteristics, where cases are denoted by crosses and controls by dots. A linear hyperplane is defined as a*X1+b*X2 , and “a” and “b” are weights associated

2.4 Lagrangian Multiplier Method

11

with the two patient-characteristics, and * is the symbol of multiplication. The object is to find “a”, “b” and “c” such that a*X1+b*X2  c for all controls and a*X1+b*X2 > c for cases. There are often many different solutions for “a”, “b” and “c” and the above graph left shows three different hyperplanes that separate cases and controls equally well. SVMs solely use the “difficult” observations that are lying close to the optimal hyperplane (denoted as the “decision boundary”); these observations are called the support vectors. The rationale of this choice may be argued by the fact that the decision boundary will not change very much if the “easy” observations are removed from the dataset whereas the decision boundary will change dramatically if one or more of the difficult observations are removed. Therefore, the difficult observations or support vectors are the critical observations in the dataset. SVMs choose the hyperplane that maximizes the distance “d” from it to the difficult observations on either side (the above graph right side). The starting points of the support vectors denoted as v1, v2, and v3 are in the above graph right side. All three vector arrows meet in the origin. One line through the starting points of v2 and v3 is drawn, and one parallel line through the starting point of v1. The best fit hyperplane line is midway between the two parallel lines, distance d (above graph right side). The distance from the hyperplane line to the origin is an important estimator in SVM statistics, and is expressed just like t-values in t-tests in a standardized way: if d ¼ w (weight), and the distance from the hyperplane line to the origin ¼ b, then the distance from the hyperplane line to the origin equals b/w (expressed in “w-units”). In order to extent this fairly simple procedure to more complex situations like multidimensional data and non linear data, a more general notation is preferred. It is given underneath.

2.4

Lagrangian Multiplier Method

Lagrange multiplier methodologies are closely related to the above support vector methods. Lagrangian hyperplanes H are defined as w’xi+b  1 for controls and w’xi+b  1 for cases, where xi is the vector of all patient-characteristics of patient i, w is a vector with weights and b is called a bias term and is comparable to an intercept of regression models. The optimal set of weights is determined so that the distance “d” is maximized. It turns out that this is equivalent to minimizing the Euclidean norm ||w||2 subject to the condition that there are no observations in the margin: w’xi+b  1 for controls and w’xi+b  1 for cases, or yi(w’xi)  1 when yi ¼ 1 for cases and yi¼-1 for controls. This is a quadratic programming problem that can be solved by the Lagrangian multiplier method. The method introduces slackvariables ξi that represent the degree of misclassification: yi(w’xi)  1ξi. Instead of minimizing ||w||2 now 0.5||w||2+C*Σ iξi is minimized where the constant C is a penalty-parameter that regulates the amount of misclassification that is accepted.

12

2.5

2 Mathematical Models for Separating Quantiles from One Another

Maximizing Linear Functions with the Help of Rectangles

Instead of, or together with, quadratic programming, rectangle programming and linear algorithms are possible. Simple maximizing functions with rectangles, and simplex algorithms for unequalities are demonstrated underneath. An example of a rectangle program is given. Y-Axis

7 6 5 4 3 2 1

0

1

2

3

4

5

6

7

8

P ¼ 3x + 2y What is Pmax, given conditions (2  x  6 and 1  y  5). Transform condition slightly. x  2 and x  6 and y  1 and y  5 The encircled crossings are in the above graph: (2,1), (6,1), (6,2), (2,5), (3,5). P ¼ 3x + 2y (2,1) p ¼ 3(2) + 2(1) ¼ 8 (6,1) p ¼ 3(6) + 2(1) ¼ 20 (6,2) p = 3(6) + 2(2) = 22 (2,5) p ¼ 3(2) + 2(5) ¼ 16 (3,5) p ¼ 3(3) + 2(5) ¼ 19

X -Axis

2.6 Maximizing Linear Functions with the Help of Simplex Algorithms

13

(6,2) p = 3(6) + 2(2) = 22 Obviously, the value 22 maximizes the objective linear function in this example.

2.6

Maximizing Linear Functions with the Help of Simplex Algorithms

The underneath simple linear model is modified from Math Libre Texts and has been used by the famous Koenker, a founder of data separation for quantile regressions. A tutor in statistics is offered two tutorials, x1 and x2. The compensation is 40$ and 30$ respectively. He does not wish to spend  12 h/week tutoring, and the preparation times for tutorial 1 and 2 are 2 and 1 h. The total preparation time should not be over 16 h. Mathematically the data are summarized underneath. z ¼ 40 x1 + 30 x2 Subject to x1 + x2  12 2x1 + x2  16 x1  0 x2  0

x1 + x2 d 12 , add nonnegative y1 → x1 + x2 = 12 z = 40 x1 + 30 x2 → 40 x1 - 30 x2 + z = 0 subject to x1 + x2 + y1 = 12 2x1 + x2 + y 2 = 16 x1 ≥ 0 x2 ≥ 0 Simplex tableau (Initial one)

x1 x2 y1 y2 z |C 1 1 1 0 0 | 12 2 1 0 1 0 | 16 --------------------------------40 -30 0 0 1 |0

Vertical lines tell apart the left and right side equations. Right side equations are four columns.

y1 1 0 0

y2 0 1 0

z 0 0 1

|C | 12 | 16 | 0 meaning y 1 = 12 y2 =16 z = 0

The basic solution associated with tableau is underneath.

14

2 Mathematical Models for Separating Quantiles from One Another

x1 x2 y1 1 1 1 2 1 0 -40 -30 0

y2 0 1 0

z 0 1 1

| | 12 y1 | 16 y2 |0 z

The most negative entries, 40 and 30 are dollars per tutorial. 40 belongs to row 3. Now, the entries far right: divide them by entries of column 1 12: 1 ¼ 12 and 16: 2 ¼ 8. And so, 8 is smallest and is in row 2, now focus on row 2. The value 2 is called pivot element. Quotients are helpful not to violate constraints, and the value of the objective function is improved by changing the number of units. Subsequently pivoting makes all other entries in column 1 a zero. Then, we will obtain a 1 in the place of a 2, the pivot element of column 1. And:

1 1 1 0 0 1 ½ 0 ½ 0 -40-30 0 0 1

| 12 | 8 | 0

We make pivot 1, and do so by dividing all of the elements of the 2nd row by 2. In order to obtain a zero in the 1st entry of row 1, multiply 2nd row by 1 and add to row 1.

0 ½ 1 -½ 0 0 ½ 0 ½ 0 -40 -30 0 0 1

| 4 y1 | 8 x1 | 0

Then multiply the 2nd row times 40, and add it to the 3rd row.

0 ½ 1 -½ 0 1 ½ 0 ½ 0 0 -10 0 20 1

| 4 y | 8 x | 320 z

If the tutor works for 8 h tutorial 1 and 0 h tutorial 2, then his profit will be 320$. If no further negative entry is in the data, we will be done. There is, however, still, a negative entry 10. Now make pivot element 1 by multiplying row 1 by 2.

x1 x2 y1 y2 z | 0 1 2 -1 0 |8 1 ½ 0 ½ 0 |8 ----------------------------0 -10 0 20 1 | 320

2.7 The Intuition of Quantile Regression

15

Subsequently, multiply row 1 by ½ and add to row 3, then multiply row 1 by 10 and add to bottom row.

x1 0 1

x2 y1 y2 1 2 -1 0 -1 1

z 0 | 8 0 | 4

x2 x1

--------------------------------------------------------------

0

0

20

10

1 |

400

z

No negative entries in the bottom row anymore, we are finished. Why is this fine? All of the values from row 3 are non-negative. The highest possible z is 400. That is, what will happen when y1 and y2 are zero. Why can row and column entries constantly be multiplied, divided and added up or subtracted. This is, because it is the standard way for solving equality and inequality versions of linear functions. Algebraic symbols are sometimes used in a peculiar way, but in the end everything seems to fit in well enough.

2.7

The Intuition of Quantile Regression

By now you will be convinced that mathematically quantile regression is pretty advanced. And so, instead of a mathematical exposé, an intuitive explanatory version will be given for the non-mathematicians. To understand the intuition of quantile regression, let’s start with the intuition of ordinary least squares, given the model y ¼ b0 + b1 x2 +......+ E. The least squares estimate minimizes the sum of the squared error terms X

ð y  f xb Þ 2

Comparatively, quantile regression minimizes a weighted sum of the positive and negative error terms, while the overall random error is no more expressed as a squared term, but rather (with multiple predictors) as the mean absolute deviation: Mean absolute deviation ¼ 1=n ½Σ ρ ðy  aτ þ b1 x1 τ þ b2 x2 τ þ . . . . . . . . .Þ where ρ is the check function of the asymmetric weight to the error depending on the quantiles, and τ is the quantile level. quantiles (often called τ).

16

2 Mathematical Models for Separating Quantiles from One Another

Each orange circle represents an observation, while the blue line represents the quantile regression line. The black lines illustrate the distance between the regression line and each observation, which are labeled d1, d2, and d3. If we assume that τ is equal to 0.9, we can compute the quadratic regression loss for the data in the image above, like this: τðd2Þ þ ð1  τÞðjd1 þ d3jÞ0:9  0:4 þ 0:1  ðj1:3 þ 0:4jÞ ¼ 0:53 Optimizing this loss function results in an estimated linear relationship between yi and xi where a portion of the data, τ, lies below the line and the remaining portion of the data, 1τ, lies above the line as shown in the graph below (Leeds, 2014).

In the above graph 90,11% of the observations are below the quantile regression line which was approximated with τ set to 0.9.

2.8

Special Case

Quantiles can also be obtained from horizontal parallel lines. This is a method, however, that does not take into account, that each value is assessed for having 90 or 10% chance of being above or below the cut-off line. And this is not what a quantile

2.9 Traditional Statistical Methods Applied in This Edition

17

regression is aiming at. Full information must be included about the random errors of all of the data, if we wish our analysis to have any predictive value.

2.9

Traditional Statistical Methods Applied in This Edition

Chapter 3 Traditional and robust regressions Chapter 4 Autoregression Chapter 5 Discrete trend Chapter 6 Continuous trend Chapter 7 Binary Poisson regression Chapter 8 Robust standard errors Chapter 9 Optimal scaling Chapter 10 Poisson intercept only Chapter 11 Gene expression regressions Chapter 12 Four predictor regressions Chapter 13 Multiple variables regressions

18

2 Mathematical Models for Separating Quantiles from One Another

Chapter 14 Multiple regression for interaction Chapter 15 Quantile regression to study Corona deaths Chapter 16 Lab predicts survival sepsis, traditional analysis Chapter 17 Multinomial Poisson regression Chapter 18 Regressions with inconstant variability Chapter 19 Restructuring categories into multiple dummy variables Chapter 20 Poisson events per person per period of time Chapter 21 Two stage least squares Chapter 22 Partial correlations Chapter 23 Random intercept regression Chapter 24 Regression trees Chapter 25 Kernel regressions Chapter 26 Quasi-likelihood regressions

2.10

Conclusions

Finding the best fit separation line in a 2 dimensional random sample of observations between two subsets, such that, for example, 10% of the observations is left and 90% is on the right side is the first objective of quantile analysis, and this task is not a particularly easy one. In this chapter several methods have been reviewed, and it is observe, that maximizing linear functions are most adequate for the purpose. With the help of simplex algorithms the best fit function for the data will be found through a maximized function. Methods for the purpose include: 1. 2. 3. 4.

Maximizing Linear Functions with the Help of Support Vectors. Lagrangian Multiplier Method. Maximizing Linear Functions with the Help of Rectangles. Maximizing Linear Functions with the Help of Simplex Algorithms.

2.11

References

All of the chapters of the current edition start with a brief review of the traditional analytic method of the different regression methods prior to the review of the relevant quantile regression method. For the purpose, generally, data examples are used from the recent edition “Regression Analyses in Clinical Research for Starters and 2nd Levelers 2nd Edition, Springer Heidelberg Germany 2021”, by the same authors. For a better understanding of differences between traditional and quantile regressions, readers may benefit from the study of this edition first. To readers requesting still more background, theoretical and mathematical information of computations given, several textbooks complementary to the current production and written by the same authors are available: Statistics applied to

2.11

References

19

clinical studies 5th edition, 2012, Machine learning in medicine a complete overview 2nd edition, 2020, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical data analysis on a pocket calculator 2nd edition, 2016, Understanding clinical data analysis from published research, 2016, all of them edited by Springer Heidelberg Germany.

Part I

Simple Univariate Regressions Versus Quantile

Chapter 3

Traditional and Robust Regressions Versus Quantile

3.1

Summary

The history, background, and the development of the analytical data models of traditional and quantile regressions have already been addressed in the Chap. 1. In this chapter the numbers of stools on a new laxative as outcome and the numbers of stools on the old laxative as predictor in 35 constipation patients will be used as data example. Simple linear regression produced a borderline p-value of 0.049, not a very powerful result. More statistical power was desirable. A GENLIN (Generalized linear regression-generalized linear regression) procedure can be followed using maximum likelihood estimators and/or robust regression. Also better statistics can be sometimes obtained with the help of quantile regression, and, in addition, quantile regression tends to give a better view of the relationships between predictor and outcome variables. In the example of this chapter quantile analysis not only provided better precision, but also a better insight into the relationship between the predictor and outcome variable.

3.2

Introduction

The history, background, and the development of the analytical data models have already been addressed in the Chap. 1. In this Chap. the numbers of stools on a new laxative as outcome and the numbers of stools on the old laxative as predictor in 35 constipation patients will be used as data example. Simple linear regression produced a borderline p-value of 0.049, not a very powerful result. We should add, that the patients no. 13 and 27 had outlier “oldtreat” scores, and the analists Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-030-82840-0_3. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. J. Cleophas, A. H. Zwinderman, Quantile Regression in Clinical Research, https://doi.org/10.1007/978-3-030-82840-0_3

23

24

3 Traditional and Robust Regressions Versus Quantile

here suggested the possibility of typing errors. But this was not confirmed, because of, otherwise, normal explanations for the observations. 35 pts newtreat oldtreat agecats patientnumber 24.00 30.00 25.00 35.00 39.00 30.00 27.00 14.00 39.00 42.00 41.00 38.00 39.00 37.00 47.00 30.00 36.00 12.00 26.00 20.00 43.00 31.00 40.00 31.00 36.00 21.00 44.00 11.00 27.00 24.00 40.00 32.00 10.00 37.00 19.00

8.00 13.00 15.00 10.00 9.00 10.00 8.00 5..00 13.00 15.00 11.00 11.00 112.00 10.00 18.00 13.00 12.00 4.00 10.00 8.00 16.00 15.00 114.00 7.00 12.00 6.00 19.00 5.00 8.00 9.00 15.00 7.00 6.00 14.00 7.00

2.00 2.00 2.00 3.00 3.00 3.00 1.00 1.00 1.00 1.00 1.00 2.00 2.00 3.00 3.00 2.00 2.00 2.00 2.00 1.00 3.00 2.00 2.00 2.00 3.00 2.00 3.00 2.00 2.00 2.00 1.00 2.00 2.00 3.00 2.00

1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00

_________________________ newtreat = new treatment oldtreat = old treatment agecats = age categories patientnumber = patient number

First, a robust regression analysis will be performed. Subsequently, a Quantile regression will be performed.

3.3 Traditional and Robust Regression

3.3

25

Traditional and Robust Regression

With traditional linear regression the old treatment was a borderline significant predictor at p ¼ 0.049 of the novel treatment. More statistical power was desirable. Instead of traditional linear regression, a GENLIN (Generalized linear regressiongeneralized linear regression) procedure can be followed in SPSS using maximum likelihood estimators instead of traditional F- and t-tests. The results are likely to produce a bit better precision. For convenience the above data file is in extras. springer.com, and is entitled “chap3robustregression”, and can be downloaded in your computer mounted with SPSS statistical software. Command: Generalized Linear Models....Generalized Linear Models....mark: Custom.... Distribution: select Normal....Link function: select identity....Response: Dependent Variable: enter new treatment....Predictors: Factors: enter old treatment.... Model: Model: enter oldtreat....Estimation: mark Model-based Estimator.... click OK. The underneath table is in the output.

Even better precision may be obtained by the use of robust standard errors, called the Hubert-White estimators by SPSS statistical software.

26

3 Traditional and Robust Regressions Versus Quantile

Command: Generalized Linear Models....Generalized Linear Models....mark: Custom.... Distribution: select Normal....Link function: select identity.... Response: Dependent Variable: enter new treatment....Predictors: Factors: enter old treatment....Model: Model: enter newtreat....Estimation: mark Robust Estimator....click OK. The underneath table is in the output.

Out of the stool scores with the old treatment, 6 scores provided p-values of < 0.05 with the Model-based Estimator, while it produced up to 14 p-values < 0.05 with the Robust Estimator. If your results are borderline significant, like in the above example, then loglikelihood regression testing and robust regression testing can, obviously, provide better statistics, and, thus, better statistical power of testing, than traditional testing can. These highly significant results from this pretty small data sample, are they credible? Maybe, not entirely, but robust regressions is a sophisticated model, where a continuous predictor variable is restructured into multiple binary variables. If we can believe in the robust methodology, then we will have to accept the result. Nonetheless, the simple univariate linear model has been replaced by the software with a multiple variables linear model. Regression analysis with multiple predictors has long been interpreted as a rather inferior type of statistical data analysis due to multiple testing and multiple type I errors. However, much has

3.4 Quantile Regressions

27

changed since regressions have entered the field of causality research, like, for example, with structural equation modeling. The EMA (European Medicines Agency) has approved the use of a few predictive variables in confirmative controlled clinical trials. And although many data analysists are still doubtful, the time has come that multiple variables regressions have received at least some serious attention. The p-values may not have the heavy interpretation that prospective blinded trial p-values have, but multiple variables regressions are no longer adjusted for multiple testing, because they are assumed to stem from a family of null hypotheses with many interactions within a single experiment. The subject of multiple testing will be addressed many times in this edition. Somewhat uneasy with the expected overwhelming result of the robust tests, investigators may decide to include a quantile regression as a contrast test. The advantage is, that, unlike robust regression, it is non-parametric, and thus less flawed in case of skewed data, and data with outliers and/or inconstant variability.

3.4

Quantile Regressions

Quantile regression needs not adjusted for multiple testing. The quantile regression analysis often includes around ten null-hypothesis tests or so, with quantiles ranging from 0.1 to 0.9. However, again a single question is answered: which one of them gives the best result. Better statistics may be obtained with the help of quantile regression, and, in addition, a better insight in the relationships between predictor and outcome variables. Quantiles (fraction of the data) and percentiles (percentage of the data) are identical terms. Usually, linear regression assumes, that the y-variable has a normal distribution, and can be summarized by means. The least square computation of the regression coefficient is obtained by the use of the means of the x- and the y-values. Ym ¼ mean of observed Y-values. Xm ¼ mean of observed X-values.



ΣðX  Xm ÞðY  Ym Þ Σ XY  nXm Ym ¼ Σ X2  nXm 2 Σð X  X m Þ 2

If a normal distribution is not assumed, summaries of quantiles, like the 0.10, 0.20, 0.30 quantile etc will be an adequate alternative for computing regression coefficients. The data file as used in the above section, is used again. The mean x and y values are replaced with either median x and y values or 0,1 to 0,9 quantile x and y values.

28

3 Traditional and Robust Regressions Versus Quantile

newtreat oldtreat agecats patientno 24.00 30.00 25.00 35.00 39.00 30.00 27.00 14.00 39.00 42.00 41.00 38.00 39.00 37.00 47.00 30.00 36.00 12.00 26.00 20.00 43.00 31.00 40.00 31.00 36.00 21.00 44.00 11.00 27.00 24.00 40.00 32.00 10.00 37.00 19.00

8.00 13.00 15.00 10.00 9.00 10.00 8.00 5..00 13.00 15.00 11.00 11.00 112.00 10.00 18.00 13.00 12.00 4.00 10.00 8.00 16.00 15.00 114.00 7.00 12.00 6.00 19.00 5.00 8.00 9.00 15.00 7.00 6.00 14.00 7.00

2.00 2.00 2.00 3.00 3.00 3.00 1.00 1.00 1.00 1.00 1.00 2.00 2.00 3.00 3.00 2.00 2.00 2.00 2.00 1.00 3.00 2.00 2.00 2.00 3.00 2.00 3.00 2.00 2.00 2.00 1.00 2.00 2.00 3.00 2.00

1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00

_________________________ newtreat = new treatment oldtreat = old treatment agecats = age categories patientno = patient number

Start by entering the file in a computer mounted with SPSS statistical software version 26. Command: Analyze....Regression....Quantile Regression....click Target Variable: enter new treatment....click Covariate(s): enter old treatment....click Criteria....mark Specify single quantiles....Quantile value(s): 0,1 Add, 0,2 Add, 0,3 Add, 0,4 Add, 0,5 Add, 0,6 Add, 0,7 Add, 0,8 Add, 0,9 Add....click Continue...click Display....Print mark parameter estimates....mark Plot or tabulate top 3 effects....in Model Effects move old treat to Prediction Lines....click Continue....click OK. In the output sheets are the underneath interactive tables and graphs.

3.4 Quantile Regressions

29

30

3 Traditional and Robust Regressions Versus Quantile

3.4 Quantile Regressions

31

32

3 Traditional and Robust Regressions Versus Quantile

The above parameter estimates show the regression coefficients of the selected quantile regressions. Out of them the quantiles 0.1 and 0.9 produced p-values of 0.026 and 0.0001.

The above graphs show the patterns of parameter estimates (regression coefficients or B-values) with their 95% confidence intervals of the different quantiles. Also the confidence intervals of the traditional linear regression of the same data are

3.4 Quantile Regressions

33

given in red. Particularly, in the low and high quantiles, the quantile regression performed markedly differently from the traditional ordinary linear regressions (in red). The quantile intercepts have an oblique pattern. With the quantile 0,1 the number of stools (effect of old treatment on new treatment) is only 12, with quantile 0,9 it has risen to 40. Why is the number of stools in the high quantiles larger than it is in the low quantiles. This is not due to any clinical effects, but rather to heterogeneity of the data as such. Obviously, many more patients with many stools are in the 0,9 quantile than they are in the 0,1 quantile. The regression coefficients of the predictor “effect of old treatment” is nicely within the 95% confidence intervals of the traditional regression.

The above graph gives the patterns of the quantile regression lines. It can be observed, that the slopes of the 0.9 quantile, and that of the 0.1 quantile give the steepest slopes. We should add, that in the example of this chapter, both traditional linear regression and robust linear regression, did not provide the amount of information and statistical power, of those of the quantile regression. Also insight in the

34

3 Traditional and Robust Regressions Versus Quantile

pattern profiles of quantile regression coefficients and the p-values estimating differences from zero were obtained and are summarized in the above Parameter Estimates by Different Quantiles and in the separate Parameter Estimates tables.

3.5

Conclusion

There is much more to say about quantile regression, such as how the regression coefficients (the B-values), and their standard errors are estimated. Also comparing models and assessing nonlinear quantile regressions are possibilities. This chapter is just a primer, but will be enough to get you started. We should add, that, both traditional linear regression and robust linear regression, did not provide the amount of information and statistical power, that quantile in this small example did, including pattern profile of quantile regression coefficients and the p-values estimating the difference from a regression coefficient of zero. We believe, that quantile regression will be an important step forward in the analysis methodology of data with outliers, skewness and inconstant variability, like current Covid-19 data (se also the Chap. 15). With linear regression, if a normal distribution is not assumed, summaries consistent of quantiles like 0,1, 0,2, 0,3 etc will be an adequate alternative for computing regression coefficients. They may be more precise predictors of the outcome than the traditional least square regressions. There is much to say about quantile regression, and this paper provides you with just a primer. Yet it should get you started. A data example was applied to compare robust regression with quantile regression. Not only better precision, but also better insight into the relationships between predictor and outcome variables was obtained.

3.6

References

All of the chapters of the current edition start with a brief review of the traditional analytic method of the different regression methods prior to the review of the relevant quantile regression method. For the purpose, generally, data examples are used from the recent edition “Regression Analyses in Clinical Research for Starters and 2nd Levelers 2nd Edition, Springer Heidelberg Germany 2021”, by the same authors. For a better understanding of differences between traditional and quantile regressions, readers may benefit from the study of this edition first. To readers requesting more background, theoretical and mathematical information of computations given, several textbooks complementary to the current production and written by the same authors are available: Statistics applied to clinical studies 5th edition, 2012, Machine learning in medicine a complete overview 2nd edition, 2020, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical data analysis on a pocket calculator 2nd edition, 2016, Understanding clinical data analysis from published research, 2016, all of them edited by Springer Heidelberg Germany.

Chapter 4

Autocorrelations Versus Quantile Regressions

4.1

Summary

Autocorrelations are linear correlation coefficients between data sets of seasonal observations, for example, mean monthly c reactive protein levels of a group of healthy subjects. Not only their means, but also their standard errors often decrease or increase with time. Also the observations are repetitive of nature, and, often, with a bimodal outcome, and, therefore, not at all independent of one another. In this chapter autocorrelation analysis will be tested against quantile regression of the same data. Not only better precision, but also better insight into the relationships between predictor and outcome variables was obtained.

4.2

Introduction

For analysis the statistical model Autocorrelation in the SPSS’ module Forecasting is required. The data file is in extras.springer.com, and is entitled “chap4seasonality”. It was previously used by the authors in SPSS for starters and 2nd levelers, Chap. 58, Springer Heidelberg Germany, 2016. Start by opening the file in your computer mounted with SPSS statistical software. In this chapter autocorrelation analysis will be tested against quantile regression of the same data. Not only better precision, but also better insight into the relationships between predictor and outcome variables was obtained.

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-030-82840-0_4. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. J. Cleophas, A. H. Zwinderman, Quantile Regression in Clinical Research, https://doi.org/10.1007/978-3-030-82840-0_4

35

36

4 Autocorrelations Versus Quantile Regressions

4.3

Autoregression Analysis

Start by entering the data file in your computer mounted with SPSS statistical software. Command: Analyze....Forecasting....Autocorrelations....Move: monthly mean c-reactive protein levels into Variable Box....mark Autocorrelations....click OK. The output sheets are show the underneath graphs and tables.

4.3 Autoregression Analysis

37

The above table and graph give the output statistics . The graph shows that in the month 11 (autocorrelation coefficient 0,419, standard error 0,144, t-value 0,419/ 0,144 ¼ 2,910, p < 0,01) autocorrelation is statistically significant, supporting the seasonality of the observed 16 month mean c-reactive protein pattern. However, independence of the monthly observations versus one another is assumed. However, in practice seasonal data are, generally, very much dependent on one another. Therefore, accounting this is welcome. For that purpose SPSS does not include an entire generalized least squares procedure, but, instead, it includes a HAC (heteroscedasticity and autocorrelation consistent estimator). Bartlett estimator is here used for the purpose.

38

4 Autocorrelations Versus Quantile Regressions

4.4 Quantile Regressions

39

The above table and graph shows the result of the adjusted procedure. The 95% confidence intervals are now wider, and the presence of significant autocorrelation can no further be confirmed.

4.4

Quantile Regressions

With quantile regressions no distributional assumptions have to be made about random errors. Quantile regression is a very robust methodology, and it even implicitly assumes independence of predictors, and, therefore, adjustments for special frequency distributions, and for predictor variables versus one another need not be taken into account. The example of the above section is used once again. Start by opening the file again, but be sure that your computer is mounted with SPSS Version 26, 2020. Command: Menu....Analyze....Regression....Quantile....Target Variable: enter mean CRP 1 month [CRP]....Covariates: enter time (months) [time]....click Criteria.... Quantile value(s): enter respectively 0,1 Add, 0,2 Add, 0,3 Add, 0,4 Add, 0,5 Add, 0,6 Add, 0,7 Add, 0,8 Add, 0,9 Add....Maximum iterations: enter 10000....

40

4 Autocorrelations Versus Quantile Regressions

click Continue....Specify Model Effects....click build terms and transfer time through main effects arrow....skip the include intercept finch....Continue....OK. The underneath tables and graphs are in the output. All of the quantile parameters were very significantly different from a parameter with zero steepness.

4.4 Quantile Regressions

41

42

4 Autocorrelations Versus Quantile Regressions

The plot of parameter estimates gives the regression coefficients as computed for each quantile model from 0,1 to 0,9. Obviously, the slope of the parameter estimates are entirely different from the zero slope of the ordinary least squares measures (in red). And so, traditional linear regression would have been a very poor fit model for the purpose. The confidence intervals of the parameter estimates in blue fitted the data pretty homogenously, and indicated a good fit of the mathematical model. Nonetheless, the regression coefficients at the 0,1 to 0,4 were pretty flat as compared to those of the 0,5 to 0,9 quantile regression coefficients.

4.5 Conclusions

43

The steep and very steep linear prediction lines are in agreement with the steep pattern of the parameter estimates (the regression coefficients). The steeper the prediction lines the stronger the prediction. Quantiles 0,9 and 0,8 were the strongest predictors.

4.5

Conclusions

With quantile regressions no distributional assumptions have to be made about random errors. Quantile regression is a very robust methodology, and it even implicitly assumes independence of predictors, and, therefore, adjustments for special frequency distributions, and for predictor variables versus one another, need not to be taken into account. The example of this chapter underlines, that quantile regressions is probably one of the finest inventions for modeling autoregressive data and forecasting methods. Why does quantile regression outperform traditional least squares regression. The 0,1 quantile produces a regression line with a cut-off of 10% of the observations above and 90% below the regression line. Subsequently, the 0,2 quantile has the cut-off of 20%, the 0,3 etc up to......0,9 cut-off of 90%. This means, that, with a 0,1 quantile, 10% of the data are above and 90% are below the regression line. This is an “ideal” situation, where for each regression line exactly the best fit predictable outcome is provided.

44

4 Autocorrelations Versus Quantile Regressions

Unfortunately, optimizations and maximizations are not perfect, but approximations, and they have to be computed using for example linear algorithms (see Chap. 2).

The above graph, for example, nonetheless, demonstrates, that doing so can provide pretty successful versions of quantile regressions. Why are autoregressive data a good fit for quantile regression. This is probably, because 0,1 to 0,9 quantile regression coefficients tend to parallel time-predicted autoregression coefficients.

4.6

References

All of the chapters of the current edition start with a brief review of the traditional analytic method of the different regression methods prior to the review of the relevant quantile regression method. For the purpose, generally, data examples are used from the recent edition “Regression Analyses in Clinical Research for Starters and 2nd Levelers 2nd Edition, Springer Heidelberg Germany 2021”, by the same authors. For a better understanding of differences between traditional and quantile regressions, readers may benefit from the study of this edition first. To readers requesting more background, theoretical and mathematical information of computations given, several textbooks complementary to the current production and written by the same author are available: Statistics applied to clinical studies 5th edition, 2012, Machine learning in medicine a complete overview 2nd edition, 2020, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical data analysis on a pocket calculator 2nd edition, 2016, Understanding clinical data analysis from published research, 2016, all of them edited by Springer Heidelberg Germany.

Chapter 5

Discrete Trend Testing Versus Quantile Regression

5.1

Summary

Current clinical trials often involve more than two treatments or treatment modalities, e.g., dose-response and dose-finding trials, studies comparing multiple drugs from one class with different potencies, or different formulas from one drug with various bio-availabilities and other pharmacokinetic properties. In such situations small differences in efficacies are to be expected, and we need, particularly, sensitive tests. A standard approach to the analysis of such data is multiple groups analysis of variance (Anova) and multiple groups chi-square tests, but a more sensitive, although, so far, little used, approach may be a trend-analysis. A trend means an association between the order of treatment and the magnitude of response. With discrete data linear regression is impossible, but a chi-square test for trends can be performed. As an example, in a 106 patient parallel-groups study the effects of three incremental dosages of an antihypertensive drug were assessed. It has approximately the same chi-square value, as the Pearson chi-square, but it has only 1 degree of freedom, and, therefore, it reaches statistical significance with a p-value of 0.050. There is, thus, a significant incremental trend of responding with incremental dosages at p ¼ 0.050. Better statistics can be obtained with the help of quantile regression, and, in addition, quantile regressions tend to give a better view of the relationships between predictor and outcome variables. In the example of this chapter quantile analysis not only better precision, but also better insight into the relationship between the predictor and outcome variable was obtained with p-values of 0.0001 rather than 0.050.

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-030-82840-0_5. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. J. Cleophas, A. H. Zwinderman, Quantile Regression in Clinical Research, https://doi.org/10.1007/978-3-030-82840-0_5

45

46

5.2

5 Discrete Trend Testing Versus Quantile Regression

Introduction

Current clinical trials often involve more than two treatments or treatment modalities, e.g., dose-response and dose-finding trials, studies comparing multiple drugs from one class with different potencies, or different formulas from one drug with various bio-availabilities and other pharmacokinetic properties. In such situations small differences in efficacies are to be expected, and we need, particularly, sensitive tests. A standard approach to the analysis of such data is multiple groups analysis of variance (Anova) and multiple groups chi-square tests, but a more sensitive, although, so far, little used, approach may be a trend-analysis. A trend means an association between the order of treatment and the magnitude of response. Trend analysis is the widespread practice of collecting information and attempting to spot a pattern. In economy, the term “trend analysis” has more formally obtained the meaning of an upward direction leading to increased profit for the investor. Although trend analysis is often used to predict future events, it could be used to estimate uncertain events in the past, such as how many ancient kings probably ruled between two dates, based on data such as the average years which other known kings reigned. In project management, trend analysis is a mathematical technique that uses historical results to predict future outcome. This is achieved by tracking variances in cost and schedule performance. In this context, it is a project management quality control tool. In statistics, trend analysis often refers to techniques for extracting an underlying pattern of behavior in a time series which would otherwise be partly or nearly completely hidden by noise. If the trend could be assumed to be linear, then trend analysis can be undertaken within a formal regression analysis, described as trend estimation. If the trends had shapes other than linear, trend testing can be performed by non – parametric methods, using, e.g., the Kendall rank correlation coefficient. Particularly valuable is the Mann-Kendall trend test, because it covers monotonous up – and downwards trends in a nonparametric way. Henry Mann, a mathematician from Vienna published in 1945 a paper entitled “Nonparametric against trend testing”. Maurice Kendall (1907–1983 from Redhill UK) invented the Kendall rank correlation coefficient (which he named Tau). Maurice Kendall must not be mixed up with David Kendall (1918–2007, a statistician involved in the queuing theory, from Cambridge UK).

5.3

Discrete Trend Analysis

With discrete data linear regression is impossible, but a chi-square test for trends can be performed. As an example, in a 106 patient parallel-groups study the effects of three incremental dosages of an antihypertensive drug were assessed. The proportion of responders in each of the three groups was used as outcome measure. The first 10 patients are in the above table. The entire data file is in extras.springer.com, and is

5.3 Discrete Trend Analysis

47

entitled “chap5trendbinary”. It has been previously used by the authors in SPSS for starters and 2nd levelers, Chap. 40, Springer Heidelberg Germany, 2016. Open the data file in your computer installed with SPSS.

106 pts responder treatment 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 2,00 responder: normotension 1, hypertension 0 treatment: incremental treatment dosages 1-3 A multiple groups chi-square test will be performed. For analysis the statistical model Crosstabs in the module Descriptive Statistics is used. Command: Analyze....Descriptive Statistics....Crosstabs....Row(s): treatment....Statistics....Chi-Square Test....click OK.

responder....Column(s):

Chi-Square Tests

Pearson Chi-Square

Value

df

Asymp. Sig. (2-sided)

3,872a

2

,144

The above table shows, that, indeed, the Pearson chi-square value for multiple groups testing is not significant with a chi-square value of 3,872 and a p-value of 0,144, and we have to conclude that there is, thus, no significant difference between the odds of responding to the three dosages. Subsequently, a chi-square test for trends can be performed, a test, that, essentially, assesses, whether the above odds of responding ([number of responder]/ [numbers of non-responders]) per treatment group increase significantly. The “linear-by-linear association” from the same table is appropriate for the purpose. It has approximately the same chi-square value, but it has only 1 degree of freedom, and,

48

5 Discrete Trend Testing Versus Quantile Regression

therefore, it reaches statistical significance with a p-value of 0.050. There is, thus, a significant incremental trend of responding with incremental dosages. _________________________________________________________________

Linear-by-Linear Association N of Valid Cases

3,829

1

,050

106

a. 0 cells (,0%) have expected count less than 5. The minimum expected count is 11,56.

The trend in this example can also be tested using logistic regression with responding as outcome variable and treatment as independent variable (enter the latter as covariate, not as categorical variable).

5.4

Quantile Regressions

For the quantile regression command in your computer mounted with SPSS statistical software version 26: Analyze....Regression....Quantile Regression....Target Variable: responder.... Covariate(s): treatment....click Criteria....Quantile value(s): add 0,4 and 0,5....Continue....click OK. Q4

5.5 Conclusion

49

Q5

The discrete trends assessed with χ2 linear by linear association had a p-value of 0,050, not a very strong one. However, the discrete trend with the help of the quantile regression models 0,4 and 0,5 (40 and 50% quantiles) produced p-values of < 0.000 each. And so, with quantile regression pretty strong predictions from treatment on being responder or not can be made.

5.5

Conclusion

Trend tests provide markedly better sensitivity for demonstrating incremental effects from incremental treatment dosages, than traditional statistical tests do. Current clinical trials often involve more than two treatments or treatment modalities, e.g., dose-response and dose-finding trials, studies comparing multiple drugs from one class with different potencies, or different formulas from one drug with various bio-availabilities and other pharmacokinetic properties. In such situations small differences in efficacies are to be expected and we need, particularly, sensitive tests. A standard approach to the analysis of such data is multiple groups analysis of variance (Anova) and multiple groups chi-square tests, but a more sensitive, although so far little used, approach may be a trend-analysis. A trend means an association between the order of treatment and the magnitude of response. We should add that, within the context of a clinical trial, demonstrating trends, may provide better evidence of causal treatment effects than simple comparisons of treatment modalities may do. The discrete trends assessed with χ2 linear by linear association had a p-value of 0,050, not a very strong one. However, the discrete trend with the help of the quantile regression models 0,4 and 0,5 (40 and 50% quantiles) as assessed in this chapter produced p-values of < 0.000 each. And so, with quantile regression pretty strong predictions from treatment on being responder or not can be made.

50

5.6

5 Discrete Trend Testing Versus Quantile Regression

References

All of the chapters of the current edition start with a brief review of the traditional analytic method of the different regression methods prior to the review of the relevant quantile regression method. For the purpose, generally, data examples are used from the recent edition “Regression Analyses in Clinical Research for Starters and 2nd Levelers 2nd Edition, Springer Heidelberg Germany 2021”, by the same authors. For a better understanding of differences between traditional and quantile regressions, readers may benefit from the study of this edition first. To readers requesting more background, theoretical and mathematical information of computations given, several textbooks complementary to the current production and written by the same authors are available: Statistics applied to clinical studies 5th edition, 2012, Machine learning in medicine a complete overview 2nd edition, 2020, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical data analysis on a pocket calculator 2nd edition, 2016, Understanding clinical data analysis from published research, 2016, all of them edited by Springer Heidelberg Germany.

Chapter 6

Continuous Trend Testing Versus Quantile Regression

6.1

Summary

With linear trend testing of continuous data the outcome variable is continuous, the predictor variable is categorical, and can be measured either as nominal (just like names) or as ordinal variable (a stepping pattern not necessarily with equal intervals). In the Variable-View of SPSS the command “Measure” may, therefore, be changed into nominal or ordinal, but, since we assume an incremental function, the default measure “scale” is OK as well. The data example applied provided a p-value of 0.050 in the linear trend test. Better statistics can be obtained with the help of quantile regression, and, in addition, quantile regression tends to give a better overview of the relationships between predictor and outcome variables. In the example of this chapter quantile analysis not only better precision, but also better insight into the relationship between the predictor and outcome variable was obtained with p-values of 0.001 and 0.0001 rather than 0.050.

6.2

Introduction

With linear trend testing of continuous data the outcome variable is continuous, the predictor variable is categorical, and can be measured either as nominal (just like names) or as ordinal variable (a stepping pattern not necessarily with equal intervals). In the Variable-View of SPSS the command “Measure” may, therefore, be changed into nominal or ordinal, but, since we assume an incremental function, the default measure “scale” is OK as well.

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-030-82840-0_6. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. J. Cleophas, A. H. Zwinderman, Quantile Regression in Clinical Research, https://doi.org/10.1007/978-3-030-82840-0_6

51

52

6.3

6 Continuous Trend Testing Versus Quantile Regression

Linear Trend Testing of Continuous Data

As an example in a parallel-groups study the effects of three incremental dosages of antihypertensive treatments will be assessed. The mean reduction of mean blood pressure per group is tested. 30 pts outcome (mean blood pressure, mm Hg)

treatment group

113,00 131,00 112,00 132,00 114,00 130,00 115,00 129,00 122,00 118,00

1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 2,00

The entire data file is in extras.springer.com, and is entitled “chap6trendcontinuous”. It is previously used by the authors in SPSS for starters and 2nd levelers, Chap. 15, Springer Heidelberg Germany, 2016. We will, first, perform a one way analysis of variance (Anova) to see, if there are any significant differences in the data. If not, we will perform a trend test using simple linear regression. For analysis the statistical model One Way Anova in the module Compare Means is used. Open the file in your computer mounted with SPSS. Command: Analyze....Compare Means....One-Way ANOVA....Dependent List: blood pressure Factor: treatment...click OK.

ANOVA VAR00002 Sum of Squares Between Groups

df

Mean Square

246,667

2

123,333

Within Groups

1636,000

27

60,593

Total

1882,667

29

F 2,035

Sig. ,150

The above table shows that there is no significant difference in efficacy between the treatment dosages, and so, sadly, this is a negative study. However, a trend test

6.4 Quantile Regressions

53

having just 1 degree of freedom has more sensitivity than a usual one way Anova, and it could, therefore, be statistically significant even so. For analysis the model Linear in the module Regression is required. Command: Analyze....Regression....Linear....Dependent: treatment....click OK.

blood

pressure....Independent(s):

ANOVAb Sum of Squares

Model 1

df

Mean Square

245,000

1

245,000

Residual

1637,667

28

58,488

Total

1882,667

29

Regression

F 4,189

Sig. ,050a

a. Predictors: (Constant), VAR00001 b. Dependent Variable: VAR00002

The above tables in the output sheets show, that treatment dosage is a significant predictor of treatment response at a p-value of 0,050. There is, thus, a significantly incremental response with incremental dosages. Trend tests provide, obviously, markedly better sensitivity for demonstrating incremental effects from incremental treatment dosages, than traditional statistical tests do. One way ANOVA using 2 degrees of freedom was not statistically significant (p ¼ 0,150) in the example given, while a linear regression with 1 degree of freedom was significant at p ¼ 0,050.

6.4

Quantile Regressions

For the quantile regression apply SPSS statistical software version 26 2020.

54

6 Continuous Trend Testing Versus Quantile Regression

Command : Analyze....Regression....Quantile Regression....Target Variable: responder.... Covariate(s): treatment....click Criteria....Quantile value(s): 0,1 Add, 0,2 Add, 0,3 Add, 0,4 Add, 0,5 Add, 0,6 Add, 0,7 Add, 0,8 Add, 0,9 Add ....click Continue.... click OK. The underneath tables are in the SPSS output. Q 0,1

Q 0,2

Q 0,3

6.4 Quantile Regressions

Q 0,4

Q 0,5

Q 0,6

55

56

6 Continuous Trend Testing Versus Quantile Regression

Q 0,7

Q 0,8

Q 0,9

The above tables show that quantile regression of many quantiles including 0,1, 0,2, 0,4, 0,6, 0,8, and 0,9 produced very significant outcomes with p-values between 0,001 and 0,03. And so we may conclude that treatment provides strong trends in the blood pressure outcomes, and is more powerful than the traditional one way anova (analysis of variance).

6.6 References

6.5

57

Conclusion

The tables show, that quantile regression of many quantiles including 0,1, 0,2, 0,4, 0,6, 0,8, and 0,9 produced very significant outcomes with p-values between 0,001 and 0,03. And so we may conclude that in the given data example the treatment provided strong trends in the blood pressure outcomes, and was more powerful than traditional one way anova (analysis of variance).

6.6

References

All of the chapters of the current edition start with a brief review of the traditional analytic method of the different regression methods prior to the review of the relevant quantile regression method. For the purpose, generally, data examples are used from the recent edition “Regression Analyses in Clinical Research for Starters and 2nd Levelers 2nd Edition, Springer Heidelberg Germany 2021”, by the same authors. For a better understanding of differences between traditional and quantile regressions, readers may benefit from the study of this edition first. To readers requesting still more background, theoretical and mathematical information of computations given, several textbooks complementary to the current production and written by the same authors are available: Statistics applied to clinical studies 5th edition, 2012, Machine learning in medicine a complete overview 2nd edition, 2020, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical data analysis on a pocket calculator 2nd edition, 2016, Understanding clinical data analysis from published research, 2016, all of them edited by Springer Heidelberg Germany.

Chapter 7

Binary Poisson/Negative Binomial Regressions Versus Quantile

7.1

Summary

In a parallel-group study of the effects of two treatments on risk of torsades de pointe binary logistic regression produced a p-value of 0.051, whereby the first treatment performed better than the second although hardly at a significant level. Binary Poisson analysis in Generalized Linear Models provided a slightly better p-values of 0.039. As alternative to binary Poisson, the closely related negative binomial model was used. It produced the same p-value and a larger Akaike Information Criterion which means that its fit was less than that of the binary Poisson model. We should add, that the negative binomial model, although it lacked goodness of fit, has the advantage that it is adjusted for overdispersion while Poisson is not. Better statistics were, obviously, provided by the quantile models, than they were by the binary Poisson and negative binomial models. With quantiles 0,30 and 0,50 the best treatment modality predicted torsades de pointes, respectively at p-values of 0,001 and 0,000 versus 0,039 and 0.039, as compared to that of the worst treatment modality. The same data example will be used once more in the Chap. 8 for the assessment of robust standard errors versus quantile. Slightly different quantiles were applied. Results were overall very similar compared to the results of this chapter.

7.2

Introduction

In a 52 patient parallel-group study of the effects of two treatments on the risk of torsades de pointe binary logistic regression produced a p-value of 0.051, whereby one treatment performed better than the other although hardly statistically Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-030-82840-0_7. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. J. Cleophas, A. H. Zwinderman, Quantile Regression in Clinical Research, https://doi.org/10.1007/978-3-030-82840-0_7

59

60

7 Binary Poisson/Negative Binomial Regressions Versus Quantile

significant. Binary Poisson analysis in Generalized Linear Models provided a slightly better p-value of 0.039. In the example of this chapter quantile analysis not only better precision was obtained, but also better insight into the relationship between the predictor and outcome variable.

7.3

Binary Poisson and Negative Binomial Regressions

Poisson regression is the traditional method for rate regressions, but it cannot only be used for counted rates, but also for binary outcome variables. Poisson regression of binary outcome data is different from logistic regression, because it uses a log instead of logit (log odds) transformed dependent variable. It tends to provide better statistics. Can Poisson regression be used to estimate the presence of an illness. Presence means a rate of 1, absence means a rate of 0. If each patient is measured within the same period of time, no weighting variable has to be added to the model. Rates of 0 or 1 after all, do exist in practice. We will see how this approach performs as compared to the logistic regression, traditionally, used for binary outcomes. As an example, in 52 patients with parallel-groups of two different treatments the presence or not of torsades de pointes (brief runs of ventricular tachycardia) was measured. The first 10 patients of the data file is given below. The entire data file is entitled “chap7poissonbinary”, and is in extras.springer.com. It was previously used by the authors in SPSS for starters and 2nd levelers, Chap. 47, Springer Heidelberg Germany, 2016. We will start by opening the data file in our computer with SPSS statistical software installed. The data from first 10 patients are underneath.

52 pts treat ,00 ,00 ,00 ,00 ,00 ,00 ,00 ,00 ,00 ,00

presence of torsade de pointes. 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00 1,00

treat = treatment modality

First, we will perform a traditional binary logistic regression with torsade de pointes as outcome and treatment modality as predictor. For analysis the statistical model Binary Logistic Regression in the module Regression is required.

7.3 Binary Poisson and Negative Binomial Regressions

61

Command: Analyze. . . .Regression. . . .Binary Logistic. . . .Dependent: torsade. . . .Covariates: treatment. . . .click OK.

The above table shows, that the treatment is not statistically significant at p < 0.050. A Poisson regression will be performed subsequently. For analysis the module Generalized Linear Models is required. It consists of two submodules: Generalized Linear Models and Generalized Estimation Models. The first submodule covers many statistical models like gamma regression, Tweedie regression, Poisson regression, and the analysis of data files with both paired continuous outcomes and predictors (SPSS for starter and 2nd levelers 2nd edition, Chap. 3, Springer Heidelberg Germany, 2015, from the same authors). The second submodule is for analyzing paired binary outcomes and predictors (SPSS for starter and 2nd levelers, Chap. 42, Springer Heidelberg Germany, 2015, from the same authors). Command: Analyze....Generalized Linear Models....Generalized Linear Models . . . .mark Custom. . . .Distribution: Poisson . . . .Link Function: Log. . . .Response: Dependent Variable: torsade. . . . Predictors: Factors: treat....click Model....click Main Effect: enter "treat. . . ..click Estimation: mark Robust Tests. . . .click OK.

62

7 Binary Poisson/Negative Binomial Regressions Versus Quantile

The above tables shows the results of the Poisson regression. Regarding the “goodness of fit” table, any statistical model is a simplification of reality and information is lost. The goodness of fit tests give some idea, but they are mainly used for comparing one model versus the other. We will soon address this issue again. Regarding the “parameter estimates” table, the predictor “treatment modality”, although insignificant in the above binary logistic model, is now statistically significant at p ¼ 0.039. According to the Poisson model the treatment modality is, thus, a significant predictor of torsades de pointes. A 3-D graph will be drawn in order to better clarify the effects of treatments on torsades de pointe.

7.3 Binary Poisson and Negative Binomial Regressions

63

Command: Graphs....3D Charts....x-axis treat....z-axis torsade....Define....x-axis treat....z-axis torsade....OK.

The above graph is in the output sheets. It shows that in the 0-treatment (placebo) group the number of patients with torsades de pointe is virtually equal to that of the patients without. However, in the 1-treatment group the latter number is considerably smaller. The treatment seems to be efficacious. Obviously, Poisson regression is different from linear en logistic regression, because it uses a log transformed dependent variable. For the analysis of yes/no rates Poisson regression is very sensitive and better than standard regression methods. As alternative to binary Poisson, the negative binomial model can be used in order to give a more accurate data model than binary Poisson does, because it allows mean and variance of samples to be different, unlike Poisson. Many events have a negative-correlated occurrence. E.g., if you observe many accidents today, you will have more chance of observing less tomorrow, and vice versa. This phenomenon causes a larger variance in the data, than, if the occurrences were entirely independent. The negative binomial model is available in SPSS’ Generalized linear models, just like Poisson is. The above data example shall be analyzed once more, but now with negative binomial analysis.

64

7 Binary Poisson/Negative Binomial Regressions Versus Quantile

Command: Analyze....Generalized Linear Models....Generalized Linear Models . . . .mark Custom. . . .Distribution: Negative Binomial . . . .Link Function: Log. . . . Response: Dependent Variable: torsade. . . . Predictors: Factors: treat....click Model....click Main Effect: enter "treat. . . ..click Estimation: mark Robust Tests. . . .click OK.

7.4 Quantile Regressions

65

The above tables are in the output sheets. The Akaike information criterion is larger than it is in the above Poisson model. This would mean, that Poisson is a better fit model for the data than the negative binomial model is. However, this is not entirely true, because binary count data tend to suffer from overdispersion, and the negative binomial model is adjusted for overdispersion while Poisson is not.

7.4

Quantile Regressions

As alternative to binary Poisson, quantile regression was applied for testing the effect of treatment on torsades de pointe. Start with entering the data file entitled “chap7poissonbinary” in your computer mounted SPSS statistica; software version 26. Command: Analyze....Regression....Quantile regression....Target Variable: torsade....Covariate (s): treatment.... Covariate(s): treatment....click Criteria....Quantile value(s): 0,3 Add ,0,5 Add....Continue....click OK. The underneath tables are in the SPSS output.

We chose the quantiles 0,3 to 0,5, because they had the largest pseudo R Squared values, and, thus, the best Model Quality. Q 0,3

66

7 Binary Poisson/Negative Binomial Regressions Versus Quantile

Q = 0,5

Better statistics were, obviously, provided by the quantile models, than they were by the Poisson and negative binomial models with respectively p-values of 0,001 and 0,004 versus 0,039 and 0,039.

7.5

Conclusion

Better statistics were, obviously, provided by the quantile models, than they were by the binary Poisson and negative binomial models. The treatment predicted torsades de pointes, respectively at p-values of 0,001 and 0,000 versus 0,039 and 0.039. Quantile regression was, thus, a method of statistical testing superior to that of binary Poisson/negative binomial. In contrast, with quantile analysis not only better precision was obtained at p-values of 0.001 and 0.000 rather than 0.039, but also better insight into the relationship between the predictor and outcome variable.

7.6

References

All of the chapters of the current edition start with a brief review of the traditional analytic method of the different regression methods prior to the review of the relevant quantile regression method. For the purpose, generally, data examples are used from the recent edition “Regression Analyses in Clinical Research for Starters

7.6 References

67

and 2nd Levelers 2nd Edition, Springer Heidelberg Germany 2021”, by the same authors. For a better understanding of differences between traditional and quantile regressions, readers may benefit from the study of this edition first. To readers requesting still more background, theoretical and mathematical information of computations given, several textbooks complementary to the current production and written by the same authors are available: Statistics applied to clinical studies 5th edition, 2012, Machine learning in medicine a complete overview 2nd edition, 2020, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical data analysis on a pocket calculator 2nd edition, 2016, Understanding clinical data analysis from published research, 2016, all of them edited by Springer Heidelberg Germany.

Chapter 8

Robust Standard Errors Regressions Versus Quantile

8.1

Summary

Robust standard errors is a methodology for handling data subsets with unusually large spread. Rate analysis using a robust Poisson regression is different from logistic regression, because it uses a log transformed dependent variable. For the analysis of robust standard errors Poisson regression is very sensitive, and, thus, better sensitive than standard logistic regression. In the example of this chapter the p-value fell from 0,051 to 0,039. As an alternative, quantile regression was performed. The predictive p-value in the quantiles 0,25 and 0,50 further fell to 0.004 and