Analysis of Quantal Response Data 9781351466677, 1351466674, 0-412-31750-8

This book takes the standard methods as the starting point, and then describes a wide range of relatively new approaches

194 109 11MB

English Pages 532 Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Analysis of Quantal Response Data
 9781351466677, 1351466674, 0-412-31750-8

Table of contents :
Content: Cover
Title Page
Copyright Page
Dedication
Table of Contents
Preface
Glossary and notation
Index of data sets
1: Data, preliminary analyses and mechanistic models
1.1 Introduction
1.2 Aspects of toxicology
1.3 Examples
1.4 Preliminary graphical representations
1.5 Mechanistic models
1.6 Interpolation and extrapolation
1.7 Discussion
1.8 Exercises and complements
2: Maximum-likelihood fitting of simple models
2.1 The likelihood surface and non-linear optimization
2.2 The method-of-scoring for the logit model
2.3 The connection with iterated weighted regression Generalized linear models2.4 Hand calculation and using tables
2.5 The chi-square goodness-of-fit test
heterogeneity
2.6 Minimum chi-square estimation
2.7 Estimating the dose for a given mortality
2.7.1 Using the delta method
2.7.2 Using Fieller's Theorem
2.7.3 The likelihood-ratio interval
2.7.4 Comparing the likelihood-ratio and Fieller intervals
2.8 Maximum-likelihood estimation for logistic regression
2.9 Making comparisons
2.10 Testing for trend in proportions
2.11 Discussion
2.12 Exercises and complements
3: Extensions and alternatives
3.1 Introduction 3.2 Natural or control mortality EM algorithm and mixture models
3.3 Wadley's problem
use of controls
3.4 Influence and diagnostics
3.5 Trichotomous responses
3.6 Bayesian analysis
3.7 Synergy and antagonism
3.8 Multivariate bioassays
3.9 Errors in dose measurement
3.10 Discussion
3.11 Exercises and complements
4: Extended models for quantal assay data
4.1 Introduction
4.2 The Aranda-Ordaz asymmetric model
4.3 Extended symmetric models
4.4 Transforming the dose scale
4.5 Additional models and procedures
goodness of link
4.5.1 Other models
4.5.2 Goodness of link

Citation preview

M O N O G RA PH S ON S T A T IS T IC S A N D A P P L I E D P R O B A B IL IT Y G eneral E ditors D.R. Cox, D.Y. Hinkley, D.B. Rubin and B.W. Silverman 1

Stochastic Population M odels in Ecology and Epidemiology M.S. B artlett (1960) 2

3

Queues D.R. Cox and W.L. Smith (1961)

M onte Carlo M ethods J.M . Hammersley and D.C. Handscomb (1964) 4

The Statistical Analysis of Series o f Events D.R. Cox and P.A.W . Lewis (1966) 5 6

Population Genetics W.J. Ewens (1969)

Probability, Statistics and Time M.S. B artlett (1975) 7

8 9

Statistical Inference S.D. Silvey (1975)

The Analysis of Contingency Tables B.S. E veritt (1977)

Multivariate Analysis in Behavioural Research A.E. M axw ell (1977) 10 11

Stochastic Abundance M odels S. Engen (1978)

Some Basic Theory for Statistical Inference E.J.G. Pitman (1979) 12

Point Processes D.R. Cox and V. Isham (1980)

13

Identification of Outliers D.M. Hawkins (1980) 14

15

Finite Mixture Distributions B.S. Everitt and D.J. Hand (1981) 16 17

18

Optimal D esign S.D. Silvey (1980) Classification A.D. Gordon (1981)

Distribution-free Statistical M ethods J.S. M oritz (1981)

Residuals and Influence in Regression R.D. Cook and S. Weisberg (1982) 19

Applications of Queueing Theory G.F. N ewell (1982)

20

Risk Theory, 3rd edition R.E. Beard, T. Pentikainen and E. Pesonen (1984)

21

Analysis of Survival D ata D.R. Cox and D: Oakes (1984)

22

An Introduction to Latent Variable M odels B.S. E veritt (1984) 23

Bandit Problems D.A. Berry and B. F ristedt (1985)

24

Stochastic M odelling and Control M .H .A. Davis and R. Vinter (1985)

25

The Statistical Analysis of C om positional D ata J. Aitchison (1986) 26 27

28

D ensity Estimation for Statistical and D ata Analysis B.W. Silverman (1986)

Regression Analysis with Applications G.B. W etherill (1986)

Sequential M ethods in Statistics, 3rd edition G.B. W etherill (1986)

29 30 31 32

Tensor M ethods in Statistics P. M cCullagh (1987)

Transformation and W eighting in Regression R.J. Carroll and D. Ruppert (1988) Asymptotic Techniques for U se in Statistics O.E. Barndoff-Nielson and D.R. C ox (1989)

Analysis of Binary Data, 2nd edition D.R. Cox and E.J. Snell (1989)

35

33

Analysis of Infectious Disease D ata N.G. Becker (1989)

34

Design and Analysis of Cross-Over Trials B. Jones and M.G. Kenward (1989)

Empirical Bayes M ethod, 2nd edition J.S. M aritz and T. Lwin (1989) 36

Symmetric M ultivariate and Related Distributions K .-T. Fang, S. K o tz and K . N g (1989)

37

Generalized Linear M odels, 2nd edition P. M cCullagh and J.A. N elder (1989) 38

Cyclic Designs J.A. John (1987)

39

Analog Estimation M ethods in Econometrics C.F. M anski (1988)

41

Analysis of Repeated M easures M. Crowder and D.J. Hand (1990)

40

Subset Selection in Regression A.J. M iller (1990)

42 Statistical Reasoning with Imprecise Probabilities P. W alley (1990) 43

Generalized Additive M odels T.J. H astie and R.J. Tibshirani (1990) 44

Inspection Errors for Attributes in Quality Control N.L. Johnson, S. K o tz and X . Wu (1991)

45 The Analysis of Contingency Tables, 2nd edition B.S. E veritt (1992) 46

Analysis of Quantal Response D ata B.J.T. M organ (1992)

(Full details concerning this series are available from the Publishers.)

Analysis of Quantal Response Data B. J. T . M O R G A N Professor o f Applied Statistics Institute o f M athematics and Statistics University o f K ent Canterbury

ES CHAPMAN & HALL

London • Glasgow • New York • Tokyo • Melbourne • Madras

Published by Chapman & Hall, 2 - 6 Boundary Row, London SE1 8HN Chapman & Hall, 2 -6 Boundary Row, London SE1 8HN, UK Blackie Academic & Professional, Wester Cleddens Road, Bishopbriggs, Glasgow G642NZ, UK Chapman & Hall, 29 West 35th Street, New York NY10001, USA Chapman & Hall Japan, Thomson Publishing Japan, Hirakawacho Nemoto Building, 6F, 1-7-11 Hirakawa-cho, Chiyoda-ku, Tokyo 102, Japan Chapman & Hall Australia, Thomas Nelson Australia, 102 Dodds Street, South Melbourne, Victoria 3205, Australia Chapman & Hall India, R. Seshadri, 32 Second Main Road, CIT East, Madras 600035, India First edition 1992 © 1992

B. J. T. Morgan

Typeset in 10/12 Times by Thomson Press (India) Ltd, New Delhi Printed in Great Britain by St. Edmundsbury Press, Bury St. Edmunds, Suffolk ISBN 0 412 31750 8 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored, or transmitted, in any form or by any means, without the prior permission in writing of the publishers, or in the case of reprographic reproduction only in accordance with the terms of the licences issued by the Copyright Licensing Agency in the UK, or in accordance with the terms of licences issued by the appropriate Reproduction Rights Organization outside the UK. Enquiries concerning reproduction outside the terms stated here should be sent to the publishers at the London address printed on this page. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication data available Morgan, Byron J. T., 1946Analysis of quantal response data / B. J. T. Morgan. - 1st ed. p. cm. - (Monographs on statistics and applied probability; 46) Includes bibliographical references and indexes. ISBN 0-412-31750-8 1. Biometry. 2. Probits. 3. Medical statistics. 4. Quantum statistics. I. Title. II. Series. QH323.5.M67 1992 92-19056 574'.01'5195-dc20 CIP

‘All substances are poisons; there is none which is not a poison. The right dose differentiates a poison and a remedy.’

Paracelsus (1493-1541)

Contents

Preface

xii

Glossary and notation

xv

Index of data sets

xviii

1

D ata, preliminary analyses and mechanistic models 1.1 In tro d u ctio n 1.2 Aspects of toxicology 1.3 Exam ples 1.4 Prelim inary graphical representations 1.5 M echanistic m odels 1.6 In terp o latio n and extrap o latio n 1.7 D iscussion 1.8 Exercises and com plem ents

1 1 3 5 17 22 25 27 29

2

Maximum-likelihood fitting of simple models 2.1 The likelihood surface and non-linear optim ization 2.2 The m ethod-of-scoring for the logit m odel 2.3 The connection w ith iterated weighted regression; generalized linear m odels 2.4 H an d calculation and using tables 2.5 The chi-square goodness-of-fit test; heterogeneity 2.6 M inim um chi-square estim ation 2.7 E stim ating the dose for a given m ortality 2.7.1 U sing the delta m ethod 2.7.2 U sing Fieller’s T heorem 2.7.3 The likelihood-ratio interval 2.7.4 C om paring the likelihood-ratio and Fieller intervals

41 41 46 50 53 54 56 59 61 62 63 65

viii

CONTENTS

2.8

M axim um -likelihood estim ation for logistic regression 2.9 M aking com parisons 2.10 Testing for trend in p ro p o rtio n s 2.11 D iscussion 2.12 Exercises an d com plem ents 3

68 73 74 75 77

Extensions and alternatives 3.1 In tro d u ctio n 3.2 N a tu ra l o r control m ortality; E M algorithm and m ixture m odels 3.3 W adley’s problem ; use of controls 3.4 Influence an d diagnostics 3.5 T richotom ous responses 3.6 Bayesian analysis 3.7 Synergy an d antagonism 3.8 M ultivariate bioassays 3.9 E rrors in dose m easurem ent 3.10 D iscussion 3.11 Exercises an d com plem ents

92 92 92 105 107 119 122 126 127 129 131 132

4

Extended models for quantal assay data 4.1 In tro d u ctio n 4.2 The A ra n d a -O rd a z asym m etric m odel 4.3 E xtended sym m etric m odels 4.4 T ransform ing the dose scale 4.5 A dditional m odels and procedures; goodness of link 4.5.1 O th er m odels 4.5.2 G oodness of link 4.6 Safe dose evaluation 4.7 D iscussion 4.8 Exercises and com plem ents

144 144 145 150 160 166 166 170 173 176 179

5

Describing time to response 5.1 In tro d u ctio n an d d a ta 5.2 D escriptive m ethods 5.2.1 T he M IC E index 5.2.2 P olynom ial grow th curves 5.3 The stochastic m odel of P u ri and Senturia 5.3.1 A m echanistic m odel

190 190 193 193 193 195 195

CONTENTS

5.3.2 5.3.3 5.3.4

5.4 5.5

5.6 5.7 5.8 6

F ittin g end-point m o rtality d a ta The D ig g le -G ra tto n extension A m ethod of conditional m ean and zero-frequency 5.3.5 The case j M 0 M ulti-event m odels Survival analysis 5.5.1 A m ixture m odel for the flour-beetle d a ta 5.5.2 The case of no long-term survivors N o n -m o n o to n ic response D iscussion Exercises and com plem ents

Over-dispersion 6.1 E xtra-binom ial v ariation 6.1.1 The beta-binom ial m odel 6.1.2 F itting the beta-binom ial m odel 6.1.3 T aro n e’s test 6.1.4 The possibility of bias 6.2 M aking com parisons in the presence of extra-binom ial v ariation 6.2.1 An exam ple involving treatm ent versus control 6.2.2 C om paring alternative test procedures 6.3 D ose-response with extra-binom ial v ariation 6.3.1 The basic beta-binom ial m odel 6.3.2 D escribing v ariation th ro u g h the param eter 6 (or p) 6.4 The quasi-likelihood ap p ro ach 6.5 O verdispersion versus choice of link function 6.5.1 B inom ial exam ples 6.5.2 W adley’s problem with over-dispersion 6.6 A dditional m odels 6.6.1 The correlated-binom ial m odel 6.6.2 M ixtures of binom ials; outliers and influence 6.6.3 C om parison of m odels 6.6.4 L ogistic-norm al-binom ial and probitnorm al-binom ial m odels 6.6.5 M odelling the effect of litter-size 6.7 A dditional applications

ix

196 201 203 206

208 211 211 215 217 222 224 234 235 237 242 246 247 248 248 250 252 253 255 257 262 263 263 266 266 268 274 275 278 281

X

CONTENTS

6.7.1 6.7.2

6.8 6.9 7

8

U rn-m odel representations; ant-lions T he beta-geom etric distribution; fecundability 6.7.3 Analysis of variance 6.7.4 In co rp o ratin g historical co n trol inform ation D iscussion Exercises an d com plem ents

281 282 284 285 287 288

Non-parametric and robust methods 7.1 In tro d u ctio n 7.2 The pool-adjacent-violators algorithm ; ABERS estim ate 7.3 The S p e a rm a n -K a rb e r estim ate of the E D 50 7.4 T rim m ing 7.4.1 T rim m ed S p e a rm a n -K a rb e r 7.4.2 T rim m ed logit 7.5 R obustness and efficiency 7.5.1 A sym ptotic variance an d the S p e a rm a n -K a rb e r estim ate 7.5.2 L, M and R estim ators 7.5.3 Influence curve robustness 7.5.4 Efficiency com parisons 7.6 A lternative distribution-free procedures 7.6.1 Sigm oidal co n strain t 7.6.2 D ensity estim ation 7.7 D iscussion 7.8 Exercises an d com plem ents

303 303

Design and sequential methods 8.1 In tro d u ctio n 8.2 O ptim al design 8.3 The up-and-dow n experim ent 8.4 The R obb in s-M o n ro procedure 8.4.1 In tro d u ctio n 8.4.2 W u’s logit-M L E m eth o d 8.5 Sequential optim ization 8.6 C om parison of m ethods for E D l00p estim ation 8.7 D iscussion 8.8 Exercises and com plem ents

340 340 341 348 354 354 356 357 359 361 362

304 306 311 311 314 317 317 319 323 325 328 328 330 332 333

CONTENTS

xi

Appendices A A pproxim ation procedures B G L M s and G L IM C B ordering Hessians D A sym ptotically equivalent tests of hypotheses E C om puting F Useful addresses G Solutions and com m ents for selected exercises

370 370 372 376 378 381 387 389

References Author index Subject index

439 484 487

Preface

This boo k has grow n o u t of a lecture course on biom etry given to M.Sc. students in statistics a t the U niversity of K ent. The stan d ard reference for the course was the b o o k Probit Analysis by Professor D. J. Finney. It is now 20 years since the appearence of the 3rd edition of Probit Analysis and there have been m any developm ents in statistics of relevance for the analysis of q u a n tal response d a ta during this time, in design, sequential m ethods, n o n-param etric procedures, over-dispersion, ro b u st m ethods, Bayesian approaches, extended m odels, influence and diagnostics, synergy and m any oth er areas. The single m ost im p o rta n t developm ent is probably the in tro d u ctio n of generalized linear m odels, allied to specialist com puter packages for fitting these models. M ost com puter packages now provide a m enu of relevant procedures for q u an tal assay data. A dditionally the whole com puting scene has changed dram atically, w ith the m ove tow ards pow erful personal com puters and w orkstations. The aim of this b o o k has been to describe the new developm ents for the analysis of q u an tal response data, an d to em phasize the links betw een the various different areas. Several extra-m ural courses have been given on the text m aterial. The first of these was at the R oyal M elbourne Institute of T echno­ logy, given jo in tly with P rofessor R. G. Jarrett. T he last was at D u p h ar, in W eesp in The N etherlands, and in betw een tw o residential courses were given at the U niversity of K ent. The interest show n by the course p articip an ts was one m otivation for w riting this book. Q u a n ta l response d a ta are quite often used to illustrate statistical techniques, an d readers of the b o o k will find th a t they will encounter m any different areas of statistics. The b o o k m ay be read by people w ith a range of different backgrounds. It is designed to be read as a coherent text or as a source of reference. N u m erate scientists should

PREFACE

xiii

be able to follow m any of the argum ents. However, for a full un derstanding a m athem atics background is necessary. M uch of the m aterial should be accessible to third year m athem atics and statistics un dergraduates in British universities who have had foundation and second-level courses in statistics in their first tw o years of study. The boo k should be ideal for study at the p o stg rad uate level by students of statistics and biom etry. There are 267 Exercises to help with the use of the boo k as a course text. The first four Appendices help to m ake the book com plete, and the fifth sum m arizes useful com puting facilities. F o r illustration, a num ber of G L IM m acros appear in the text, and a small num ber of exam ples are given in BASIC and M IN IT A B . However, prior know ledge of these packages/languages is n ot a prerequisite for understanding the m aterial of the book. Well over 50 d a ta sets are presented. Several of these now have classic status, in th at they are, som etim es uncritically, regularly used to illustrate new procedures. Some of the exam ples have arisen from consulting experience with the D ivision of A nim al H ealth at C SIR O , M elbourne, w ith Pfizer C entral Research, Sandwich, K ent, and with Shell Research, S ittingbourne, K ent. I am grateful to m any individuals for their help and com m ents while this book has been w ritten. At the Biom etry D ivision of Pfizer, K ent, a range of problem s were raised and discussed by P. Colm an, R. Hews, T. Lewis, H. R oss-P arker and R. W hite. I have been particularly fortunate in supervising two p o stgraduate students w orking in relevant areas. The m aterial of C hapters 5 and 6 owes a clear debt to the Ph.D . thesis of Sim on Pack, who w orked on a CASE aw ard w ith W ellcome R esearch L aboratories, Beckenham , K ent, while C h ap ter 4 has likewise benefited from the M.Sc. dissertation of P au l G oedhart. D eb o rah Ashby read the entire book as referee, and both M artin R idout and D avid Sm ith read p articu lar chapters. To these three I am m ost grateful for m any helpful corrections and com m ents. P ro m in en t am ongst the others w hom I should th an k are: Beverley Balkau, Jo h n Fenlon, Janneke H oestra, H ans Jansen and R ichard Jarrett. M ichael B rem ner provided com puting advice and help with the troff system. E ncouragem ent was provided by Professors B. M. B ennett and A. A. Rayner, and the late D avid W illiams, and useful advice by Sir D avid Cox. P a rts of the book were typed by Lilian B ond and A rija C rux b u t the lion’s share of the lab o u r was

xiv

PREFACE

carried o u t by M avis Swain, w ho surpassed even her legendary typing skills w ith great h u m o u r an d patience. Finally I th an k m y wife, Janet, and children, C hloe and Leo, for their to leran t acceptance of my regular w eekend absences over the four-year period w hen the b o o k was w ritten. Byron J. T. Morgan C anterbury

Glossary and notation

E xcept where noted below, a standard notation is used throughout the book. A small number o f notational clashes have been adopted between different chapters if that improved comprehension or corresponded to standard usage. corpus luteum: g landular tissue in the ovary, which forms after ru p tu re of the follicle at ovulation. It secretes progesterone, dominant lethal test: experim ent in which experim ental units are male anim als, an d each male is m ated to one or m ore females, fecundability: probability of conception per m enstrual cycle, implant: used here to denote egg im planted in w om b following fertilization, isolate: a pure culture of an organism , micromelia: abnorm ally small size of the arm s or legs, minimum inhibitory concentration: lowest concentration (of an antiinfective agent) at which a p articu lar organism ’s grow th is inhibited. phocomelia: congenital absence of the upper arm an d /o r upper leg (e.g. as side-effect of thalidom ide). E [ ]: expectation. V( ): variance. Pr( ): probability. L: likelihood. /: log-likelihood. D: deviance. X 2: P earson goodness-of-fit statistic. N (p 9 o 2)\ norm al distribution, m ean /i, variance a 2. O(x): stan d ard norm al c.d.f. (x): stan d ard norm al p.d.f.

xvi

G LOSSARY A N D N O T A T IO N

(x, y ): bivariate stan d ard no rm al c.d.f. E D 100p (L D 100p,, E C 100p) = 9p (for 0.5); E D 5O = 0 (but see also beta-distributions, below). 0R: R eed -M u en ch estim ator of 9; 0D: D rag sted t-B eh ren s estim ator of 9. EM, E b, E db, estim ators of 9 from the up-and-dow n experim ent. k: nu m b er of doses, cases o r treatm ents. m: n um ber of signal p resentations (C hapter 3)/num ber of sam pling times (C hapter 5)/num ber of litters in a g roup (C hapter 6). {db 1 ^ i ^ k}: doses. [M o re generally { x j, or { z j, w hen doses are n o t involved, or a tran sfo rm atio n is used.] A t = (di+ 1- d d . nt individuals are treated at dose dt an d rt respond. In the tim e interval (0, tj), rtj respond to dose dt; nij = rij—riJ ^ 1 (C hapter 5). O f riij insects exposed to a{ units of A an d bj units of £ , r 0- die (C hapter 3). P t = P{di) = p robability of response to dose ^ /p ro b a b ility th a t Xj respond o u t of rij in the m ixture m odel of eq u ation (6.12). P f: tran sitio n m atrix for m ovem ent betw een states betw een times tj _ 1 an d tj (C hapter 5). Pi = r j n t (C hapter 2). P(t\d) = p robability of response to dose d by tim e t. Pij = p robability of response to dose dt in tim e-interval (tj_ tj). si ~ ri — ntP , _

T j - tljPi

y/nM-Pd t: used to denote tim e/iterate num ber, as in C(f). Bin (n, p): binom ial distribution, index n and probability p. B(a, /?): beta function: r(a )r ( /J ) /r (a -I- j?). I k^(x): incom plete gam m a integral. p: m ean of a ran d o m variable, especially (C hapter 6) for a beta ran d o m variable. p\ S p e a rm a n -K a rb e r estim ate of 9 (estim ate of m ean of tolerance distribution). /2a: a% trim m ed S p e a rm a n -K a rb e r estim ate. p f jth non-central m om ent of tolerance distribution.

GLOSSA RY A N D NO T A T IO N

xvii

p {j): jth central m om ent of tolerance distribution. h: n um ber of hits in m ulti-hit model. h (p \ h(p; X): link functions. hv(q)\ quantits. R(a): Mills ratio. F(x): cum ulative distrib u tio n function (c.d.f.). F(x): em pirical c.d.f. F(x): estim ate of c.d.f. by linear in terp o latio n from ABERS estim ate. tr(A): trace of m atrix A. I(v, 0 , J(v, Q; Fisher inform ation/expected inform ation matrices. {//(u): kernel function (C hapter 7). 4>T9f {x ): influence curve. S C TJ x ) : T ukey’s sensitivity curve. I C T F D(d, y): influence curve based on dose m esh D. I C T F(d, y): influence curve. (a, /?): location-scale p air of p aram eters/stan d ard param eterization for beta distribution. (/i, 6): alternative param eterization for beta distribution (but note also use of 6 for E D 50). p = 0/(1 + 0), and also m ore generally as correlation between litterm ates/also p is used to denote relative potency. m{. num ber of litters in fth treatm en t group. n{y size of yth litter in ith treatm en t group. n um ber responding out of ni7- (C hapter 6). p t: p robability of response for all litters in ith treatm ent group (C hapter 6). [NB: subscript i is som etim es d ro p p ed for sim plification.] X = {xij}9 the design m atrix (except C h ap ter 6).

Index of data sets

Irw in (1937) Effect o f anti-pneumococcus serum on mice. 6 M ilicer and Szczotka (1966) Age o f menarche in Polish girls. 1 H ew lett (1974) M ortality o f flour-beetles following spraying with Pyrethrins B. 8 Pack (1986a) Knockdow n o f houseflies follow ing aerosolfly spray. 10 Jarrett, M o rg an and Liow (1981) Effect o f arboviruses on chicken embryos. 10 Tim e course o f effect o f an arbovirus on chicken embryos. 11 H ypersensitivity reactions to a drug. 11 H asem an an d Soares (1976) Foetal death o f mice in control populations. 12, 13 G rey an d M o rg an (1972) Signal detection data. 14 T rajstm an (1989) Colony count data. 15 M arsden (1987) Serological data. 16 G iltinan, Capizzi and M alani (1988) M orta lity o f tobacco budworm treated with m ixtures o f insecticides. 17 D a ta sets 2-22 from C openhaver an d M ielke (1977). 30, 31 Bliss (1935) M ortality o f adult flo u r beetles after exposure to gaseous carbon disulphide. 32 Stanford School of M edicine T o xicity o f guthion on mice. 32 Weil (1970) Effect on number o f viable pups born to pregnant rats fe d a treated diet. 33 K ooijm an (1981) M o rtality o f D ap h n ia m agna in water containing cadmium chloride. 33, 192 C arter and H u b ert (1981b) M orta lity o f trout fr y exposed to copper sulphate. 34 P earso n and H artley (1970) Perforation o f cardboard disks. 35 Sim pson and M argolin (1986) Colony count data. 36 Ashford and Sowden (1970) Breathlessness and wheeze in working miners. 38

IN D E X OF DATA SETS

xix

Healy (1988) Smoking and the menopause. 39 M a rrio tt and R ichardson (1987) Anti-fungal drugs in vivo and in vitro. 39, 40 Busvine (1938) T oxicity o f ethylene oxide, applied to grain beetles. 42 Finney (1974) Vaso-constriction o f skin o f the digits. 68 T aro n e (1982a) Lung tumor incidence in F344 rats exposed to nitrilotriacetic acid, and control incidence. 75, 286 M ilicer (1968) Age o f menarche o f rural Polish girls. 85 Silvapulle (1981) Psychiatric patient responses to a General Health Questionnaire. 87 G ra u b a rd and K o rn (1987) Congenital sex organ malformation and alcohol consumption. 88 Voting intentions. 89 Finney (1952) Effect o f rotenone, deguelin, and a m ixture o f these. 90 Rai and V an Ryzin (1981) Effect o f D D T in diet o f mice. 93 H o ek stra (1987) M o rtality o f aphids exposed to nicotine. 95 Larsen et al. (1979) M o rtality o f mice exposed to N 0 2 before dosing with Streptococcus pyogenes. 96 Lwin an d M artin (1989) Use o f anthelmintic fo r clearing worms from infected sheep. 103 W adley (1949) Fictitious data on effect o f low temperature on fru it flies. 106 Baker, Pierce and Pierce (1980) Environmental impact o f chemicals. ( Also artificial augmentation) 106, 295 R idout and Fenlon (1991) M o rtality o f moths fe d viruses. 130 Pierce et al. (1979) M o rtality o f fish exposed to zinc. 134 Griffiths (1977) Distribution o f wasp eggs. 136 M o rg an (1982) Polysperm y in sea-urchin eggs. 136 Racine et al. (1986) M o rtality from an inhalation acute toxicity test. 139 C arter an d H u b ert (1984) T oxic effect o f copper substance on fish. 192 F enlon (1988) M ortality o f moth larvae given different doses o f viruses. 232 Segreti and M unson (1981) N eonatal acute toxicity to trichloromethane. 252 P aul (1982) Abnormalities in rabbit foetuses. 279 M o risita (1971) D istribution o f ant-lions in fin e sand. 282 W einberg and G laden (1986) M enstrual cycles to pregnancy. 283 C row der (1978) Germinating seeds. 284

XX

IN D E X O F DATA SETS

Aeschbacher, V uataz, Sotek and Stalder (1977) Foetal m ortality in a control population. 290, 291 W illiam s (1988b) Chromosomal aberrations fro m an in vivo cytogenetic assay. 292 W illiam s (1988a) Deaths per litter in a mouse teratology experiment. 301 S an ath an an et al. (1987) S ix sets o f antibiotic assay data. 316

CHAPTER 1

Data, preliminary analyses and mechanistic models

1.1

Introduction

The nam es H iroshim a, C hernobyl and T halidom ide are synonym ous w ith tw entieth century tragedies. M any years after the exploding of atom ic bom bs over H iroshim a an d N agasaki, the effects of exposure to nuclear rad iatio n on the survivors are quantifiable in term s of increased incidence of leukaem ia, as described in the paper by Arm itage and D oll (1962). An exam ple involving chrom osom e ab erratio n in survivors of H iroshim a is considered in C h ap ter 6. The drug Thalidomide h ad been prescribed as a safe hypnotic drug, but the w inter of 1961 saw the horrifying reports of its use resulting in babies b o rn w ith deform ities of phocomelia, o r micromelia. As Beedie and D avies (1981) om inously w rote, ‘It had n o t been tested in anim als for teratogenicity, b u t thou san d s of babies b o rn to m others who had taken the d rug during pregnancy provided the missing d ata .’ C om m on to these tw o illustrations is the exposure of hum an beings to substances th at are either un n atu ral, or provided at unnaturally high levels. The response of individuals, adults or em byros, is binary: they are either affected by the time they are inspected, or they are not. In m ore general term s, discrete responses m ay take a variety of forms, such as reduction of pain, alleviation of breathing problem s, im provem ent in acne, rem ission from leukaem ia, and so on. D a ta which quantify the effect of exposure of individuals to substances such as new drugs, or to radiation, are often described as discrete, or quantal. Responses need n o t ju st be binary, and later we shall see exam ples of q u an tal d a ta which m ay result in three or m ore possible outcom es. This b o o k is concerned w ith the analysis of q u an tal response data, som etim es called dose-response data, or q u an tal assay data. Such

2

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D ELS

d ata m ay arise in a wide variety of different areas as we shall see, and m ay be collected from a properly designed scientific experim ent, or result from observational studies. Thus, for exam ple, girls of different ages m ay be classified by w hether o r n o t they have started m enstruation; patches of w oollen fabrics m ay be assessed for the degree of ‘prickle’ they elicit in h u m an subjects; viruses ingested by insects m ay or m ay n o t kill them ; widely used food additives m ay be tested for their undesirable side-effects. An exam ple of this was cited in an article in The Independent new spaper of 23 Septem ber 1987: ‘Several preservatives m ay cause asthm atic reactions in suscep­ tible people. A nd one, m ethyl paraben (E218), is the m ain volatile com ponent in the vaginal secretions of beagles - it m ay cause socially em barrassing behaviour in dogs. E218 is used in beer and coffee and m any o th er foods.’ The m ain em phasis in the exam ples w hich illustrate the m ethodology of the b ook will be on the evaluation and testing of substances, m ainly drugs, for use in hum ans. F requently the effect investigated is w hether or n o t there is a positive outcom e from using the drugs resulting in efficacy studies; however, it is also of vital im portance to consider the possible harm ful side-effects of o th er­ wise potentially beneficial treatm ents. Thus, for example, patients suffering from the spine-fusing disease, A nkylosing Spondylitis, m ay be treated by rad iatio n therapy, b u t leukaem ia m ay result as an undesired side-effect (C ourt B row n and D oll, 1957). The rad iatio n used in m am m ography has been estim ated as likely to cause ju st one excess cancer per 106 m illion w om en screened (W hitehouse, 1985; see also Breslow an d D ay, 1980, p. 62). The B abylonian C ode of H am m urabi, of 2200 b c , ordained th a t if a p atien t died, the treating physician should lose his hands, an d this is regarded as the first exam ple, indeed a som ew hat extrem e one, of the need for regulation of procedures for treating h u m an beings. In m odern times, pharm acopoeias have been devised th ro u g h o u t the w orld, presenting stan d ard s for drug purity. The first statu te to control drug quality in A m erica was passed in 1848, while as recently as 1968 the M edicines Act of G reat B ritain p roduced new safeguards for the developm ent, p ro d u ctio n an d use of new drugs. Because a nu m b er of the exam ples in this b ook are draw n from toxicology, it is w orthw hile outlining im p o rta n t aspects of toxicology before we start, and this is done in the next section. An excellent

1.2

ASPECTS OF TO XIC OLOGY

3

in tro d u ctio n to the statistical aspects of the full range of drug developm ent and testing is given by Salsburg (1990).

1.2

Aspects of toxicology

The activity of chem ical substances can som etim es be gauged from their physico-chem ical properties, and the Q u antitative S tru c tu re Activity R elationship (QSAR) procedures described by B ergm an and G ittins (1985) are designed to search for new active substances using physical structure and electrochem ical pro p erty correlates with established substances of know n perform ance. N ew chem icals m ay also be tested in vitro. Thus for exam ple the Ames test for m utagenicity positively identified 157 out of a series of 175 know n carcinogens (M cC ann et a/., 1975). U ltim ately, how ­ ever, tests in vivo are necessary. The revolutionary oral anti-fungal drug fluconazole was n o t found to be especially effective in vitro: the ‘m odest in vitro profile understates the excellent in vivo activity of fluconazole dem on strated in anim al m odels of fungal infections and in clinical trials’ (M arrio tt an d R ichardson, 1987). See also Exercise 1.26. In using n on -h u m an anim als as m odels for hum ans the basic assum ption is always th a t the m odel is appropriate. W ith the possible exception of arsenic, all know n chem ical carcinogens in hum ans are carcinogenic in some, b u t n o t all species of anim als used in laboratories, so the m odel has to be chosen with care (Klaassen, 1986). F o r further discussion on the ex trap o lation from anim als to hum ans, see M antel and Bryan (1961), C ornfield (1977) and P ark and Snee (1983). C arageenan, which is a seaweed extract, is used in products such as ice cream and biscuits, yet it causes changes resem bling ulcerative colitis in the bowels of guinea pigs, rabbits and mice. Inevitably effects such as these are the result of doses given at far higher levels th an those com m only encountered in foods, and to relatively small groups of anim als. This is a stan d ard toxicological procedure, and is necessary in order to reduce cost and unnecessary suffering in experim ental anim als. The difficult problem is then to extrapolate from a know n dangerous dose in anim als to a virtually safe one for hum an consum ption, and we discuss this fundam ental problem in sections 1.6 and 4.6. Different toxins m ay be adm inistered in different ways, for exam ple

4

DATA, PRELIM IN A R Y ANALYSES A N D M EC H A N ISTIC M O D ELS

Table 1.1 W eight Dosage Dose Surface area (g ) (m g /k g ) (m g/anim al) (cm 2) M ouse Rat Guinea pig Rabbit Cat M onkey D og Human

20 200 400 1500 2000 4000 12000 70000

100 100 100 100 100 100 100 100

2 20 40 150 200 400 1200 7000

Dosage (m g/cm 2)

46 325 565 1270 1380 2980 5770 18000

0.043 0.061 0.071 0.118 0.145 0.134 0.207 0.388

th ro u g h ingestion, by co n tact w ith the skin, or by intravenous injection, and their effect can be radically affected by the size of the anim al tested. T hus it is quite usual for dosages to be given in m g/kg of body weight, for exam ple, o r m g/cm 2 of body area. Table 1.1, tak en from K laassen (1986), shows how a co n stant dosage m easured in m g/kg translates into different overall doses per anim al, for a variety of species, an d different dosages in term s of m g/cm 2. It is difficult to appreciate w hat a dosage m easured in m g/kg actually becom es w hen scaled up to life-size, and T able 1.2, also taken from K laassen (1986), provides the required interp retation, together with a crude toxicity rating to describe the different lethal doses. The distinction betw een dose an d dosage th a t is draw n here will be m aintained th ro u g h o u t the book. Before new drugs can be tested in the stan d ard progression of clinical trials on h um an subjects, they m ay be screened on a variety Table 1.2 Probable lethal oral dose fo r humans T oxicity rating or class 1. 2. 3. 4. 5. 6.

Practically nontoxic Slightly toxic M oderately toxic Very toxic Extremely toxic Supertoxic

Dosage > 15 g/kg 5 -1 5 g/kg 0 .5 -5 g/kg 5 0 -5 0 0 m g/kg 5 -5 0 m g/kg < 5 m g/kg

For average adults M ore than 1 quart Between pint and quart Between ounce and pint Between teaspoonful and ounce Between 7 drops and teaspoonful A taste (fewer than 7 drops)

1.3

EXAM PLES

5

of anim als, with tests designed for a corresponding range of different effects. These include acute and chronic toxicity, with experim ents in the latter case possibly running for a n um ber of years, and usually perform ed on rats. R abbits are the preferred anim al for tests for eye and skin irritation, while the guinea pig is usually used for tests for skin sensitization, when this seems appropriate. Tests for possible teratological effects usually involve rats and rabbits, and substances are adm inistered to m ales a n d /o r females, before m ating, and, for females, during gestation, and during lactation. O bservations include the pregnancy rate and the viability of progeny, and study m ay continue for several generations. M u tatio n effects can be sought th ro u g h a num ber of in vivo an d in vitro procedures. The d o m inant lethal test, which we encounter again in C h ap ter 6, involves giving a male anim al (usually a rodent) a single dose of the com pound p rio r to m ating w ith one o r tw o females. The females are then killed before term , and num bers of live em bryos an d corpora lutea recorded for analysis. The ex trap o latio n from anim als to hum ans takes us th ro u g h w hat has been referred to as the ‘species barrier’. W e see th a t substances m ay be adm inistered in a variety of ways, and by single o r repeated doses. Substances which are toxic by one route of application m ay n o t prove toxic by another: the skin m ay prove to be an effective b arrier to poisons; the liver m ay detoxify a substance given orally, which m ay be far m ore toxic if inhaled, for example. W hile a com pound itself m ay n o t be toxic, a m etabolite of it m ight be. Clearly tests m ust try to reflect the intended use of substances. If they are likely to find their way into water, they need to be assessed for possible effects on fish, Crustacea and so forth. A quatic experim ents m ay differ from those on m am m als in th at exposure to the toxic agent m ay be continuous. M any of the features described in this section will be encountered in the exam ples which now follow.

1.3

Examples

W e shall now present a n um ber of exam ples to illustrate the wide range of problem s to be considered, an d to provide instances of the different types of experim ent described in the last section. In all cases response is q uantal, an d in m ost cases there is a single covariate, such as m ean age group o r dose level, which is deem ed likely to affect the response. In some cases there are several covariates, which

6

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D E L S

may, singly or in conjunction, influence the response. W e shall see also the kinds of questions th a t arise an d require answ ers in an app ro p riate statistical setting. The rem ainder of this ch ap ter also serves as an in tro d u ctio n to the rest of the book. Exam ple 1.1 An experim ent to assay an anti-pneum ococcus serum (dose m easured in cc). Irw in (1937) analysed the d a ta of T able 1.3. G ro u p s of mice were given a serum inoculation, at various doses, p rio r to being infected w ith pneum ococci. W e see th a t as the dose of serum is increased, the p ro p o rtio n of mice protected increases. T he relationship betw een dose and resulting p ro p o rtio n is frequently simplified by transform ation in each case. H ere we have logarithm s of doses, an d the com m only used transform ation, logit (p) = loge {p/(l — p)} of pro portions. O f interest here is the serum level to set for routine anti-pneum ococcus inoculation. Table 1.3 Effect o f anti-pneumococcus serum on mice 10.158 -f- log2 ( serum dose) 1 2 3 4 5

Exam ple 1.2

No. o f mice protected

No. o f mice in experiment

Proportion protected ( p )

Logit (P )

0 2 14 19 30

40 40 40 40 40

0.000 0.050 0.350 0.475 0.750

- 2.944 - 0 .6 1 9 - 0 .1 0 0 1.099



Age o f menarche in 3918 W arsaw girls

This exam ple differs from m ost of the others presented in this chapter in th a t the d a ta arise from an observational study rath er th an an experim ental one. How ever, we can see the qualitative sim ilarity betw een the d a ta of T ables 1.3 and 1.4, and we shall see later how they m ay be analysed by the same m ethods. N evertheless there rem ains an im p o rtan t distinction betw een the tw o different types of study, and we shall at tim es find it necessary to em phasize this distinction.

1.3

7

EXAMPLES

Table 1.4 Age o f menarche in Polish girls M ean age o f group ( years ) 9.21 10.21 10.58 10.83 11.08 11.33 11.58 11.83 12.08 12.33 12.58 12.83 13.08 13.33 13.58 13.83 14.08 14.33 14.58 14.83 15.08 15.33 15.58 15.83 17.58

No. having menstruated

No. o f girls

Proportion having menstruated ( p )

Logit (P )

0 0 0 2 2 5 10 17 16 29 39 51 47 67 81 88 79 90 113 95 117 107 92 112 1049

376 200 93 120 90 88 105 111 100 93 100 108 99 106 105 117 98 97 120 102 122 111 94 114 1049

0.000 0.000 0.000 0.017 0.022 0.057 0.095 0.153 0.160 0.312 0.390 0.472 0.475 0.632 0.771 0.752 0.806 0.928 0.942 0.931 0.959 0.964 0.979 0.982 1.000

_ -

- 4 .0 7 6 - 3.784 - 2.809 - 2 .2 5 1 - 1 .7 1 0 - 1.658 - 0.792 - 0.447 -0 .1 1 1 - 0 .1 0 1 0.541 1.216 1.110 1.425 2.554 2.781 2.608 3.153 3.283 3.829 4.025 -

These d a ta were presented by M ilicer and Szczotka (1966) and record, for a sam ple of 3918 W arsaw girls taken in 1963, w hether or n o t they h ad reached m enarche (onset of m enstruation). This is prob ab ly the best know n of a n um ber of studies of age of m enarche. O th er studies include those by Burrell et al. (1961) and M ilicer (1968). D a ta resulting from the second of these papers are presented in Exercise 2.23. Interestingly, differences are detectable betw een individuals of different race and of different socio-econom ic status. F ro m a purely statistical p o in t of view, in the experimental context, d a ta sets as large as these are less frequently encountered th an m uch sm aller sets, such as th a t of T able 1.3, and m ay allow discrim ination between com peting simple probability m odels which usually are indistinguishable.

8

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D E L S

Exam ple 1.3

The effect o f insecticide on flour-beetles

H ew lett (1974) observed the effect of insecticide sprayed onto flour-beetles a t four different concentrations. The d a ta given in Table 1.5 differ from those of T able 1.3 in th a t insects are used, application is topical, by spraying, different sexes are distinguished an d also the observations are m ade a t a num ber of times, rath er th a n ju st one. The d a ta of the last tw o row s present the responses for the entire length of the experim ent, o r en d p o in t m ortalities as they are called, an d so are qualitatively sim ilar to the d a ta of T able 1.3. W hen presented w ith such d a ta we m ight look for sex differences, b o th in term s of overall response and speed of response. W hen sum m arizing overall responses rates, o r w hen com paring these betw een sexes, we m ight question the extent to which precision and Table 1.5 Numbers o f male ( M ) and fem ale ( F ) flour-beetles (Tribolium castaneum) dying in successive time intervals following spraying with insecticide (P yrethrins B ) in Risella 17 oil. The beetles were fe d during the experiment in an attem pt to eliminate natural mortality. D ata from H ew lett (1 9 7 4 ) Concentration ( mg/cm 2 deposit) Time interval ( days) 0 -1 1 -2 2 -3 3 -4 4 -5 5 -6 6 -7 7 -8 8 -9 9 -1 0 10-11 11-12 12-13 N o. survivors N o. treated

0.20 M

0.32 F

0.50

M

F

M

0.80 F

M

F

3 11 10 7 4 3 2 1 0 0 0 1 1

0 2 4 8 9 3 0 0 0 0 0 0 0

7 10 11 16 3 2 1 0 0 0 0 0 0

1 5 11 10 5 1 0 1 0 0 0 0 0

5 8 11 15 4 2 1 1 0 0 0 0 0

0 4 6 6 3 1 1 4 0 0 0 1 1

4 10 8 14 8 2 1 0 0 1 0 0 0

2 7 15 9 3 4 1 1 0 1 0 0 0

101 144

126 152

19 69

47 81

7 54

17 44

2 50

4 47

1.3

EXAMPLES

9

pow er have been increased by collecting d a ta over time. We consider this point in detail in C h ap ter 5. The beetles involved here are Tribolium castaneum, the rust-red flour-beetle. They are sm all insects, 3 -4 m m long, infesting flour, and eating this o r b roken grain (Hewlett, P. S., personal com m unication). The fact th a t the insects were sprayed m eans th a t different beetles receive different doses, for a given concentration. The analysis in C h ap ter 5 ignores this feature, b u t it is discussed in section 3.9. Exam ple 1.4

Recovery o f insects

An im p o rtan t feature of aerosol fly sprays is w hether they knock flies down, and n o t necessarily w hether the flies are actually killed in the process - som etim es flies recover from ‘knock-dow n’, as the d a ta of Table 1.6 show. H ow m ight we com pare the results of the tw o experim ents? W e discuss a m echanistic m odel for such d ata in C h ap ter 5. Exam ple 1.5 Experim ents to investigate the effect o f arboviruses on chicken eggs Ja rre tt et al. (1981) analysed experim ents carried out to investigate the effects of arboviruses injected into chicken em bryos. The aim was to quantify the potency of arboviruses, with a view ultim ately to assessing how these m ight affect lam b foetuses. Two exam ples of the resulting d a ta are given in Table 1.7. In this exam ple there are three possible responses, and, as was implicit also in the last tw o examples, we are interested in com parisons between sets of data. D a ta of this kind frequently result from m aking observations over time, as in the last two examples, but the tim e inform ation is suppressed in this case. T hus in T able 1.7 eggs were classified 18 days after injection of the virus; non-specific deaths in the first few days were excluded, each group of eggs having been originally of size 20. An illustration of tim e-dependent d a ta for this kind of experim ent is given in Table 1.8. Eggs were candled, i.e. held up to the light, each day to see w hether the em bryo was dead o r alive. In m any investigations responses m ay be due to different causes. Presented with pairs of different w oollen fabrics, with only one of each pair being ‘prickly’, subjects who cannot discrim inate between

10

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D ELS

Table 1.6 For two experiments, A and B, the data below give the numbers o f houseflies ( M usca domestica) airborne at several times after the initial dose o f spray was administered: a fix ed amount o f spray was released into a wind tunnel in which the flies were allowed to fly freely. D ata from Pack (1986a) Experiment A

B

Concentration ( p g /l)

Concentration ( p g /l)

Time ( minutes )

0.3

1.0

2.0

0.3

1.0

2.0

1 5 10 20 60 180

18 15 12 15 18 18

12 0 0 2 4 16

9 0 0 0 0 17

19 10 12 13 18 20

19 0 0 0 13 22

10 0 0 0 0 10

group size

18

16

22

20

22

20

Table 1.7 The effect o f two arboviruses on chicken embryos Alive Virus Facey’s Paddock

Tinaroo

Control

Inoculum titre (P F U /e g g )

No. o f eggs

Dead

3 18 30 90

17 19 19 20

3 4 8 17

1 1 2 1

13 14 9 2

3 20 2400 88000

19 19 15 19

1 2 4 9

0 0 9 10

18 17 2 0

18

1

0

17

Deformed N o t deformed

the fabrics by to u ch m ay correctly identify the prickly item by chance. In o th er cases the correct response can result from a clear perception of prickle on the p a rt of the subjects. D eath m ay result from a cause o th er th a n the application of a poison. Even onset of m enstruation m ay, in some cases, be incorrectly ascribed to bleeding due to

1.3

11

EXAMPLES

Table 1.8 Time course o f an experiment to investigate the effect o f an arbovirus on chicken embryos. The data give the cumulative number dead out o f 20, except fo r log dose 0.65, when an egg was dropped on day 8 Day Log dose

1

2

3

4

5

6

7

8

9

10

11

12

13

14

0.65 2.50 4.32 6.23

0 2 2 0

0 2 2 1

1 2 2 1

1 2 2 1

3 2 2 2

3 2 2 2

3 2 2 3

3 3 4 6

4 3 4 7

4 3 6 10

4 3 7 11

4 3 9 12

4 4 9 12

4 4 11 14

Control

0

0

1

1

1

1

1

1

1

1

1

1

1

1

pathological causes. In Exam ple 1.3, beetles were fed in order to m inimize n atu ral m ortality. In cases where n atu ral response is possible, it is advisable for control groups to be em ployed, as in Table 1.8. F u rth e r illustrations are given in Exam ple 1.7. W ays of dealing with n atu ral response as in a control group are considered in section 3.2.

Exam ple 1.6

H ypersensitivity reactions to a drug

The d a ta in Table 1.9 are taken from a m uch larger study into the possible side-effects of a drug. Differing experim ental protocols at different sites resulted in experim ents of appreciably variable lengths Table 1.9 H ypersensitivity reactions to a drug, administered at four sites, A, B, C or D Site A

B

c

D

1 1 1 1 0

0 0 0 0 0

0 0 0 0

0 0 0 0

0 0 0

0 1 0

0 0 0 1

1 1 0 0

Time on drug ( days)

11 22 20 7 78 27 399 55

Presence o f a reaction (I — reaction)

Sex ( 2 = fem ale)

Dose (m g )

1 0 0 1

2 2 1 1

0 0 0 0

2 2 1 1

250 250 250 100 250 50 150 125

12

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D ELS

being ru n before the studies were term inated. Here, as in the last tw o exam ples, tim es are recorded in ad d itio n to w hether a response to o k place. O f p rim ary im portance to the p h arm aceutical com pany involved was w hether there was evidence of hypersensitivity reactions being related to the dose level used. Exam ple 1.7

Foetal death in a control population

N ew drugs need to be tested carefully for any possible effects on preg n an t anim als. T he d a ta in T able 1.10 are taken from H asem an and Soares (1976) an d ju st describe con tro l groups from do m in an t lethal assays, m entioned in section 1.2. In this experim ent a d ru g ’s ability to cause dam age to reproductive genetic m aterial, sufficient to kill the fertilized egg o r developing em bryo, is tested by dosing a m ale m ouse an d m ating it to one o r m ore females. A significant

Table 1.10a Sample No. 1 o f Haseman and Soares (1 9 7 6 ) Observed frequency distribution o f fo eta l death in mice Litter size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0 2 2 3 5 2 2 2 6 2 2 19 33 39 34 38 13 8 2

1

1 2 2 2 1 3 4 11 24 27 30 22 16 4 4 1

2

Number o f dead foetuses 3 4 5 6 7 8

9

10

11

12

13

1

2 1 2 3 11 12 14 18 14 3 2

1 1

1 2

3 5 6 6 4 4 3 1

4 5 6 2 3 2

4 2

1 1 1

1 1 1

1

1

1.3

EXAMPLES

13

Table 1.10b Sample No. 3 o f Haseman and Soares (1 9 7 6 ) Observed frequency distribution o f fo eta l death in mice Litter size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

0 7 7 6 5 8 8 4 7 8 22 30 54 46 43 22 6 3

1

Number o f dead foetuses 2 3 4 5

2 2

1 1

4 7 9 17 18 27 30 21 22 6

2 1 7 2 9 12 8 13 5 3 2

1

6

7

8

1

1 1

9

1

1 1 1 2 4 3 2

1 1 2 1 1 1 1 1

1 2 1

1 1

1

1

increase in foetal deaths is then indicative of a m utagenic effect. W e need to consider how we m ight describe such d a ta sets in a relatively simple m anner, and how we m ight m ake com parisons with similar d a ta sets corresponding to treated anim als. This is the topic of C h ap ter 6, where the basic assum ption of a binom ial distribution for responses is relaxed to accom m odate extrabinomial variation, which usually arises w hen different litters of anim als are involved.

Exam ple 1.8

Signal detection experiments

A com m on experim ent in psychology involves presenting subjects with stim uli which m ay either ju st be noise (N), or m ay involve a signal su p erim p o sed u p o n noise (S N ). T h e subjects indicate w h eth er or n o t they th o u g h t the signal was present, som etim es qualifying their responses with a m easure of confidence. The perform ance of subjects m ay be m onitored under a variety of adverse environm ental conditions, and it is then of interest to m easure the extent to which

14

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D ELS

Table 1.11 Data resulting from a signal detection experiment on three different subjects. Each subject responds ‘Y es’ if the stimulus was thought to be present, ‘N o ’ if it was not, etc. Responses Subject

Stimulus

No

N o t sure

Yes

1

N SN

30 9

10 7

15 35

2

N SN

25 2

17 18

5 30

3

N SN

18 2

7 10

3 16

perform ance m ay change as conditions change. Such experim ents m ay m odel behaviour such as the vigilance of ra d a r screen m o nitors in subm arines. The d a ta of T able 1.11, taken from G rey and M org an (1972), provide an illustration. A gain we w ant to sum m arize the d ata, and m ake com parisons betw een subjects. R elevant analysis is provided in section 3.5. An alternative form of signal detection experim ent arises w hen subjects are inform ed th a t a stim ulus is present on ju st one of m occasions, for m > 2, an d the subjects have to select the occasion they think corresponds to the signal presentation. This is called an m -alternative forced-choice experim ent, an d will also be discussed again in C h ap ter 3.

Exam ple 1.9 The Australian bovine tuberculosis eradication campaign In w ork aim ed at the eradication of bovine tuberculosis in A ustralia, suitably treated bovine tissue is placed on culture plates and exam ined for the grow th of colonies of M ycobacterium bovis. M aterial for culture is d econtam inated p rio r to inoculation onto culture m edia and the d a ta in T able 1.12 describe colony counts w hen tw o different d econtam inants (H P C and oxalic acid) are applied, in varying concentrations. W hile there are obvious similarities betw een this experim ent and, say, th a t of Exam ple 1.1, there is no universal upper lim it to a colony count, and the d a ta of Table 1.12

5 0.5 0.05 0.005

52 44

[O x a lic a c id ] % w eight/ volume

0.75 0.375 0.1875 0.09375 0.075 0.0075 0.00075

[H P C ] % w eigh t/ volume

80 51

15 33 26 54

4 12 6 46 30 62 42

6 31 32 31

8 13 20 42 27 38 45

10 11 23 35 51 54 32

1 13 39 20 39 38 39 0 17 18 19 31 46 40

13 30 24 37

4 26 30 50 1 41 52 73

9 33 28 44

Decontaminant: oxalic acid No. M . Bovis colonies a t stationarity

9 12 23 18 53 54 49

55 34

50 37

58 46

50 56

43 64

50 51

C ontrol experiment where no decontaminant is used No. M. Bovis colonies at station arity

14 27 33 36

2 11 16 33 30 53 3*

6 40 28 50

5 16 23 29 36 58 34

53 67

Decontaminant: H P C No. M. Bovis colonies at station arity

54 40

12 31 26 37

14 21 33 41 38 54 45

51.8

Sample mean

13 20 22 -

7 2 21 36 22 57 51

110.8

Sample variance

9.3 31.2 30.1 45.8

Sample mean

6.0 12.8 22.2 31.9 35.7 51.4 41.9

Sample mean

Table 1.12 C olony count data, taken from Trajstman (1 9 8 9 ). The value marked by a * was om itted from all analyses

23.1 39.1 70.8 164.4

Sample variance

19.6 24.4 80.6 102.3 100.0 66.5 40.6

Sample variance

16

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D ELS

will require different p robability m odels, to be described in sections 3.3 and 6.5.2. See also Exercise 1.22. Exam ple 1.10

Serological data

The results of a serological survey carried o u t in Z aire into the extent of m alarial infection in individuals aged greater th an 6 m onths are given in Table 1.13. In this exam ple the percentage sero-positive is b oun d ed above by a factor reflecting the overall incidence of m alaria. W e consider m odelling these d a ta in C h ap ter 3. Table 1.13 Data from Bongono (Z a ir e ) showing the proportions o f individuals in different age groups with antibodies present, as assessed by a particular serological test. Data from Marsden (1 9 8 7 ) M ean age group ( years) 1.0 2.0 3.0 4.0 5.0 7.3 11.9 17.1 22.0 27.5 32.0 36.8 41.6 49.7 60.8

Exam ple 1.11

No. o f individuals examined

No. sero-positive

Percentage sero-positive

60 63 53 48 31 182 140 138 84 77 58 75 30 62 74

2 3 3 3 1 18 14 20 20 19 19 24 7 25 44

3.3 4.8 5.7 6.3 3.3 9.9 10.0 14.5 23.8 24.7 32.8 32.0 23.3 40.3 59.5

M ixtures o f drugs

The d a ta of T able 1.14 result from an experim ent designed to investigate how tw o insecticides (A an d B) m ay act in com bination. O f interest here is w hether insecticides interact to produce enhanced perform ance (synergy), o r a reduction in perform ance (antagonism ). An analysis of these d a ta is provided by G iltinan et al. (1988) and we discuss their findings in section 3.7.

1.4

17

PR ELIM IN A R Y G RAPH ICA L R EPR ESEN TA TIO N S

Table 1.14 The results o f a study to investigate the contact insecticidal activity o f mixtures o f two insecticides, A and B. The target insect was the tobacco budworm, Heliothis virescens. Treatm ent was administered by means o f direct application o f one microlitre fo r each dosage to the body o f each insect. M ortality was measured 96 hours after treatment ( Data from Giltinan et a l, 1988)

M ixture

Amount o f A (ppm )

Amount o f B(ppm )

Number o f dead insects

Number o f insects tested

B B B B A25:B75 A25:B75 A25:B75 A25:B75 A50:B50 A50:B50 A50:B50 A50:B50 A75:B25 A75:B25 A75:B25 A75:B25 A A A A

0 0 0 0 6.50 3.25 1.625 0.812 13.00 6.50 3.25 1.625 19.50 9.75 4.875 2.438 30.00 15.00 7.50 3.75

30.00 15.00 7.50 3.75 19.50 9.75 4.875 2.438 13.00 6.50 3.25 1.625 6.50 3.25 1.625 0.813 0 0 0 0

26 19 7 5 23 11 3 0 15 5 4 0 20 13 6 0 23 21 13 5

30 30 30 30 30 30 30 30 30 30 29 29 30 30 29 30 30 30 30 30

1.4 Preliminary graphical representations An obvious first ap p ro ach to the kind of d a ta illustrated so far is to p lot p ro p o rtio n s affected against dose, o r log dose, or time, or w hatever appears appropriate. This is done in Figures 1.1-1.3 for the d a ta in Tables 1.3-1.5, respectively. The value of doing this is illustrated in Figure 1.3, for example: we can appreciate th a t males ap p ear to be m ore susceptible th a n females and, furtherm ore, th at when they respond they ap p ear to do so m ore quickly th an females. We shall quantify these differences by using m ixture models, from the area of survival analysis, in C h ap ter 5. O ne m ay well consider fitting a straight line to points such as those of Figure 1.1. However, it is preferable to transform the

x

0.75-

0 50

-

0 25

-

.

.

0.00'1.0

0.0

3.0

2.0

4.0

5.0

Figure 1.1 A plot o f the proportions protected versus log2 (d o se ) fo r the data o f Table 1.3. The reason fo r connecting the two proportions shown is given in section 1.6.

10 .

"

x

x x xx

X X X X

x X X X

0.0

X

8.0

X X

10.0

XX

12.0

14.0

16.0

18.0

Figure 1.2 A plot o f the proportions o f T able 1.4 versus mean age o f groups.

1.4

PR ELIM IN A R Y G RAPH ICA L REPR ESEN TA TIO N S

19

Figure 1.3a A plot o f the proportions o f female beetles responding versus time, from Table 1.5, reproduced from Pack ( 1986a). K ey 0.2 m g/cm 2

A

0.32 m g/cm 2

B

0.50 m g/cm 2

C

0.80 m g/cm 2

D

p ro p o rtio n s first. In m any cases the plot corresponding to th a t of Figure 1.1 has a m ore sigm oid appearance, as is true of the points of Figure 1.2. If we plot the logits of the p ro p o rtio n s versus age for the d a ta of T able 1.4, we o b tain the m ore linear plot of Figure 1.4. Finite logits do n o t exist for p ro p o rtio n s of 0 or 1. C orresponding doses are indicated by arrow s on the graph. Special graph paper m ay be used if the plotting is to be done by hand. F o r Figure 1.4 the least squares linear regression line is, logit {p (x )}= - 2 0 .8 + 1.58x

(1.1)

where p(x) denotes the p ro p o rtio n th a t have reached m enarche by age x. The p ro d u c t-m o m e n t correlation betw een logit {p(x)} and x has value 0.992, and so one m ight feel th a t the d a ta are well described by equ atio n (1.1). However, the p ro p o rtio n s of Figure 1.2 result from

20

DATA, P RELIM INARY ANALYSES A N D M ECH ANISTIC M O D E L S

Figure 1.3b A plot o f the proportions o f male beetles responding versus time, from Table 1.5, reproduced from Pack ( 1986a). K ey 0.2 m g/cm 2

A

0.32 m g/cm 2

B

0.50 m g/cm 2

C

0.80 m g/cm 2

D

binom ial distributions, and will have unequal variances. A w eighted regression, weighting inversely w ith respect to the variance of logit (p) (Exercise 1.2) gives the regression line: logit {p(x)} = - 20.0 + 1.54*

(1.2)

revealing little difference from eq u atio n (1.1) - b u t see also Exercise 1.3. The fitted line of eq u atio n (1.1) is called a m inim um chi-square line, while th a t of equ atio n (1.2) results from the technique of m inim um logit chi-square. These m ethods are com pared in section 2.6, after a full discussion of the m axim um -likelihood estim ation procedure for stan d ard q u an tal response data. Before fitting lines to d a ta we can also consider w hether a prelim inary transform ation of the explanatory variate, x, m ight im prove the fit. A logarithm ic tran sfo rm atio n is often used routinely

1.4

PRELIMINARY GRAPHICAL REPRESENTATIONS

21

3.0 x*

X

x

X

0.0-

X X X X X X X

-

30 .

X

-

X

1 H 8.0

10.0

X

12.0

14.0

16.0

18.0

Figure 1.4 A plot o f the logits o f the proportions o f Table 1.4 versus age, excluding points fo r which the logit is not finite.

for this purpose, though in som e cases it has im paired the fit, rath er th an im proved it; we shall discuss this further in detail in C h ap ter 4. In m any exploratory investigations of new substances, their potency m ay be uncertain before the experim ent. In such a case a wide range of dose levels is therefore sensible, and a n atu ral device is to space the doses equally on a log scale, and then also for convenience to present results, plots and analyses in term s of th a t scale. If we set x = 0 in equation (1.2) we see th at birth equates to onset of m en tru atio n in a very small, b u t non-zero, fraction of new born female children. This should n o t w orry us unduly, since it involves extrap o latio n well outside the age range over which d a ta were collected. O ver the age range of the collected d ata the m odel m ay provide a succinct description of the data. However, we can see th at if the m odel had been form ulated in term s of log (age), this problem w ould n o t have arisen. In fact, as we shall see in C h apter 4, the logarithm ic transform ation im proves the fit of the m odel for these data. E x trap o latio n is the subject of section 1.6. The use of logarithm s can also ap p ear n aturally from various m echanistic m odels which we shall now describe.

22

1.5

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D E L S

Mechanistic models

W hen we consider the d a ta of T able 1.4, on age of onset of m enstruation, it is n a tu ra l to suppose th a t age of m enarche, in a hom ogeneous p o p u latio n of girls, has some distribution, with cum ulative distribution function F {a + /fa), w here a, ft are a location and scale pair of param eters. In such a case, the probability, P, of, for exam ple, 79 girls o u t of 98 w ith m ean age of 14.08 years having reached m enarche can be app ro x im ated by the binom ial form

T he exact form for P takes into account the interval n atu re of the d a ta - see Exercise 1.13. An obvious contender for the form of F( ) is the no rm al distribution. An exam ple of this is cited by B iom etrika tables and refers to the d eto n atio n of explosives at varying distances from c ard b o ard discs. In each experim ent the p ro p o rtio n of discs perforated is noted (Exercise 1.14). This sam e m odel is also used for sta n d a rd q u an tal assay data, as in T able 1.3, and it m ay be justified by the classical, o r threshold m odel for q u an tal response data. In this m odel it is assum ed th at each individual in the relevant p o p u latio n has a dose tolerance, or threshold, T say, to a p articu lar substance. If the dose adm inistered, d, is greater th a n T then the individual responds. O therw ise it does not. If the tolerances are distributed th ro u g h o u t the p o p u latio n with distribution F( ol + /fa), say, as in the above illustration, then the probability of individual response to dose d is simply: P r { T ^ d ) = F(ot + pd) In practice this m odel m ay also be used w hen d is a dosage, rath er th a n a dose (see Exercise 1.15 for furth er discussion). In this book we shall refer to tolerance/threshold m odels/distributions. A small frac­ tion of individuals m ay have high tolerances, giving rise to a positively skewed tolerance distribution. A logarithm ic dose transform ation m ight then be advantageous if the m odel to be fitted assum es a sym m etric tolerance distribution. H istorically the favoured form for P ( ) has been norm al, resulting in w hat is called probit analysis. T he greater sim plicity of the cum ulative d istrib u tio n function of the sim ilar logistic distribution has resulted in em phasis now being placed on use of the logistic

1.5

M ECHANISTIC M OD ELS

23

distribution and the resulting logit analysis, a p articular exam ple of logistic regression discussed in section 2.8. However, a com putational advantage of the p ro b it m odel arises if a dose d is observed with error, and we retu rn to this err or s-in-variables situation in C h apter 3. P ro b it or logit m odels m ay be ado p ted from the pragm atic view point of simply requiring an ap p ro p riate description of the data, and this has already been done of course in the fitted m odel of eq uation (1.2), w ithout any reference to threshold models. T here are cases where the threshold m odel is n o t appropriate; in cancer form ation, for exam ple, tum ours m ay result from a change to a single cell initially and sim ilarly death m ay follow from infection by a single virus particle. However, threshold m odels usually provide a useful way of thinking ab o u t the data. W e shall consider an extension of the threshold m odel in C h ap ter 5, when we m odel times to response. We m ay note here th a t there are also other areas where ideas of thresholds have been found to be useful for analysing data; for example, A nderson an d A itkin (1985), and Exercise 1.24. In psychology, a sim ilar m odel provides the justification for signal detection theory which is used to describe d a ta such as those of Exam ple 1.8 (Exercises 1.12 and 2.8). This theory has been given a general setting by M cC ullagh (1980), as a way of analysing contingency table d a ta with ordered categories; here too the term inology of p ro b it an d logit (and other) m odels is used. The resulting m odels are discussed in section 3.5, for the analysis of m ultiple response d ata, as in T able 1.7. In general term s we m ay describe m odels as mechanistic or descriptive (in the latter case, Ripley, 1987, prefers the term convenient). The latter type of m odel does n o t rely on a specification of a m echanism , and simply aims to sum m arize the data, and provide a fram ew ork for inference. T hus we m ay regard a simple linear regression m odel as descriptive. The end-product of a m echanistic m odelling exercise m ay be a descriptive m odel, whose param eters play no role other th an fitting the m odel to the data, and we shall encounter several exam ples of this. P u ri and S enturia (1972) proposed an elaborate m echanistic m odel for the way in w hich insects m ight attem p t to shed insecticide, th ro u g h a ran d o m sequence of losses of ran d o m am ounts. This m odel was then used to fit d a ta by supposing th a t the individual insect hazard rate at any tim e was a function of the am ount of insecticide rem aining by th a t time. W e consider this m odel in detail in

24

DATA, PRELIM IN ARY ANALYSES A N D M ECH ANISTIC M O D E L S

C hapters 4 and 5. O ne-hit an d m ulti-hit m odels are also described in C hapters 4 an d 5. O riginally devised as m odels of carcinogenicity, these m odels have quite recently been em ployed for describing quite general q u an tal response d a ta - see Rai an d V an Ryzin (1981). The basic prem ise is th a t individuals exposed to a substance can be likened to a target b o m b ard ed w ith arrow s, at an intensity determ ined by the dose level adopted. In the one-hit m odel it is supposed th a t a single arro w on targ et is sufficient to elicit a response. The multi-hit m odel is m ore stringent in requiring several hits. The multi-stage m odel supposes th a t various stages have to be com pleted, either in series o r in parallel before a response is obtained. F o r a com prehensive review, see K albfleisch et a l (1983). If a toxic response results from at least h hits from arrow s arriving in a P oisson process a t rate (d for some fixed (say unit) time, where d corresponds to the dose level, then (see Exercise 1.17) the p ro bability of response is given by:

for

0< d
0 for/c2 = 0 for k 2 < 0

Here, ( k u k 2 ), are ad ditional shape param eters to be determ ined from the data. F o r this Bliss d a ta of Exam ple 4.2, for instance, estim ates of k 1 = 0 .1 6 and k 2 = —0.53 were obtained from fitting the m odel to the logarithm s of the doses. D raw graphs of versus rj and P(d) versus rj and com m ent on the role of k x and k 2 . Discuss how this m odel m ay be fitted using G L IM . 4.18 The sym m etric extended tolerance d istributions proposed by A ran d a-O rd az (1981) are defined im plicitly by:

2

F(x)x — (1 — F(x)}A_| F(x)x + {I - F ( x ) Y

D raw graphs to illustrate the distributions for a variety of values for X. 4.19

(Cox and Snell, 1989, p. 199.) If the probability of response at

184

E X T E N D E D M O D E L S FOR Q U A N T A L ASSAY DATA

dose d9 P(d) is given by P{d) = F ( d \ where F(d) is a fully-specified cum ulative d istrib u tio n function, show th a t it is possible to transform the dose scale to: w = z ( d), for a suitable function z ( d) so th a t the p robability of response to the transform ed dose value w can be given by any cum ulative d istrib u tio n function required. D iscuss the im plications of this for the w ork of section 4.4. 4 2 0 (van M o n tfo rt an d O tten, 1976.) D erive the necessary algebra for fitting a m odel based on the 2 -d istribution to q u an tal assay data. F ro m fitting the m odel to the age-at-m enarche d a ta of Table 1.4, van M o n tfo rt and O tten estim ate 2 = 0.12, w ith an estim ated stan d ard erro r of 0.06. Verify th a t the d istrib ution w ith 2 = 0.14 is approxim ately norm al (cf. Exercise 4.5) an d discuss how these results m ay be interpreted in the context of choosing betw een the p ro b it an d logit m odels for these d a ta (cf. Exercise 4.6). N o te th a t the stan d ard P earso n goodness-of-fit statistics are here: X 2 = 21.87 for the logit m odel, and X 2 = 21.90 for the p ro b it m odel (d.f. = 23 in each case). 421 Use the Pregibon goodness-of-link ap p roxim ation for fitting the A ran d a-O rd az asym m etric m odel to the Bliss d a ta set of Exam ple 4.2, using G L IM , an d com pare the results w ith the exact fit using the G L IM p ro g ram of Exercise 2.18. 4 2 2 * P rovide a way of iterating the P regibon one-step procedure considered in Exercise 4.21. P ro g ram the m ethod in G L IM and try it o u t on the d a ta of Exam ple 4.2, using a variety of different starting values. Y ou m ay find it useful to consider the G L IM program provided by B aker et a l (1980) (Exercise 4.26). 4 2 3 f Use the Pregibon goodness-of-link ap proxim ation to fit the extended 2-distribution of eq u atio n (4.4) to a variety of d a ta sets. 4 2 4 (Brown, 1982.) U se the results of A ppendix D to construct tw o score tests w ithin the extended m odel proposed by Prentice (1976b) an d given in section 1.4, of m 1 = m 2 (i.e. a test of sym m etry) and of m l = m 2 = 1 (i.e. a test of the adequacy of the logit model). N o te th a t C iam pi et al. (1982) used this m odel in the context of survival analysis.

4.8

EXERCISES A N D C O M PLEM ENTS

185

4.25* (Williams, 1987a). Use the P regibon (1981) one-step approxi­ m ation, described in A ppendix A and used in C hapter 3, to obtain approxim ations to the influence of responses to individual doses in determ ining the significance of tests of fit of simple m odels w ithin an extended family. 4.26* (Baker et al. (1980). Use the P regibon goodness-of-link ap p ro ach to fit W adley’s problem with controls and a Poisson erro r structure, as discussed in C h ap ter 3. Provide a G L IM program . 4.27 (van M ontfo rt and O tten, 1976). If a logit m odel is a p p ro ­ priate for a set of d a ta b u t instead an extended m odel is fitted which contains the logit m odel as a special case, one m ight expect estim ates of features of interest, such as E D 100p values, to be estim ated less precisely due to the in tro d u ctio n of an additional shape param eter, X, say. Such a loss of precision can be estim ated, using ratios of variances of, say, an ED 100p value, evaluated when each m odel is fitted to 'd ata. C arry out this calculation for the m odel based on the Xdistribution, for a variety of types of experim ent (i.e. varying doses, the num ber of doses, and the d istribution of individuals over doses). 4.28 (Lwin and M artin, 1989). The assum ption of hom ogeneous groups of individuals responding uniform ly to a given dose level of a substance is n o t always tenable, and ways of relaxing this assum ption will be described in C h ap ter 6. C onsider how you w ould develop a m odel in which the tolerance distribution is a m ixture of two com ponent distributions. (This appro ach is also discussed in Exam ple 3.3. Cf. also G o o d (1979), Boos and Brownie (1991) and the w ork of section 5.5.1.) 4.29 As a further possible m odel, consider a B ox-C ox transfor­ m ation of the odds ratio

rath er th an the dose-scale, where P ( d ) is the probability of response to dose d. See G uerrero and Jo h n so n (1982) for further discussion.

186

EXTENDED MODELS FOR QUANTAL ASSAY DATA

430 (i) C onditional u p o n the p aram eter /?, a ran d o m variable X has the W eibull distribution, w ith p robability density function f ( x |/?) = a/bca_1 e ~ px\

x^O

where a, /? > 0 ,

If /? has a gam m a distrib u tio n w ith p robability density function, g(P) = -----— -----, T(v)

where v, d > 0,

> 0

m ake use of the fact th a t J® g(P)d/3 = 1, to show th a t the unconditional probability density function of X is given by .. av 0, I k^(x) is the incom plete gam m a integral, k .,k -

h i(x) :

1

exp( — Cy)dy,

for k ^ 1, and I 0^{x) = 1

m

F o r co m p u tatio n it is m ore efficient to generate the com ponents of S(t) by recursion form ulae, together with a suitable procedure for truncating the sum m ation. These are described in Exercise 5.10. The properties of the estim ators /?, fi and 0 as the probability of surviving a dose df. P ack (1986a) investigated taking H{tf, di) =

- \ (l + e mj)

where at is ju st a function of the dose level, db and is a function of dose an d time, an d then fitting the m odel of equation (5.11) to the flour-beetle data.

212

DE S C R IB IN G TIME TO RESPONSE

E xperim entation w ith fitting a variety of different m odels revealed th at it was only necessary for rjtj to be a function of tim e alone, and a m odel th a t was fitted separately to the m ale and female beetle d a ta h ad the form: H(tj; d y 1 = (1 + *-) _1,

for

0, an d p aram eters /?, >0

(5.12)

irrespective o f the dose. T his is the sam e qualitative conclusion th a t was reached by H ew lett (1974) in his original analysis of the flour-beetle d ata. W e recognize the cum ulative d istribution function of eq u atio n (5.12) as being th a t of a lo g -lo g istic distribution, i.e. the distribution of a ran d o m variable W, w hose n a tu ra l logarithm , Z = loge W has a logistic d istribution. An assumption of a timeto-response d istrib u tio n which does n o t involve the dose was m ade by M cLeish an d T osh (1990), to illustrate m ethodology described in section 8.5. The N e ld e r-M e a d simplex m ethod was used to fit the m odel of equations (5.11) and (5.12) to the flour-beetle d ata, separately for the m ales and for the females. T he m axim um -likelihood param eter estim ates, w ith estim ated asym ptotic stan d ard errors (in parentheses) and correlation m atrices, are show n below. M ales Parameter Estimate

Estimated errors ( in parentheses) and correlations

0. This perm itted separate fa values to be estim ated for each dose level in the first instance. The extended form of equatio n (5.13) has been discussed by Farew ell and Prentice (1977) and P e ttitt (1984). A m ore general form, giving a generalized P areto m odel, has been considered by C layton and Cuzick (1985) and has also been investigated by B ennett (1986). If X = 1 then the cum ulative distribution function of equation (5.13) reduces to the log-logistic form considered already, while if X = 0 it becom es the W eibull distribution. In this family we therefore have representatives of p ro p o rtio n al odds (X = 1) and p ro p o rtio n al hazards (X = 0) families of models. Analysis based on the extended m odel of equatio n (5.13) is given in P ack and M organ (1990a) - see Exercise 5.15 for extended discussion. Pack and M organ (1990a) also experim ented with extending the logit form adopted for the {a*}, b u t found this unnecessary (Exercise 5.16). 5.5.2

The case o f no long-term survivors

In the flour-beetle exam ple considered above, the assum ption of long­ term survivors at each dose seemed reasonable. The m odel used extends th a t proposed by Farew ell (1982b), w ho em ployed a logistic m ixing p ro p o rtio n and a W eibull survival tim e distribution. In m any cases, and the d a ta of Tables 5.1 an d 5.2 provide suitable illustrations, it w ould be unrealistic to assum e th a t there are long-term survivors. In such an event we w ould m odel the p robability of response by tim e tj given dose dt sim ply and directly by p iJ = F(tj;di) - F ( t j _ 1;di) in which F(tj,d^ is a suitable cum ulative distribution function. Two examples of this will now be provided.

216

D E S C R IB IN G TIME TO RESPO NSE

Example 5.1

Kooijman’s Daphnia data

P ack (1986a) used the m odel of eq u atio n (5.13) to describe the d ata set of T able 5.1. U sing m axim um -likelihood estim ation and the N elder-M ead (1965) simplex m ethod, he o btained a residual deviance of 71.22 on 54 degrees of freedom , resulting in a fit which is ju st acceptable at the 5% level. W hen the {/?f} were p lo tted against the doses {di} a definite relationship was evident, and from the sim pler m odel rj(t, d^ = \// log t - {oti + ot2 exp( - a 3 dt)}

(5.14)

a residual deviance of 75.27 on 58 degrees of freedom was obtained.

Table 5.9 Details o f fittin g the model o f equation (5 .1 4 ) to the data o f Table 5.1 (P ack, 1986a) E s tim a te d a s y m p to tic e r r o r s ( in p a r e n t h e s e s ) a n d c o r r e l a t i o n s M a x im u m P a r a m e te r

lik e lih o o d

X

e s tim a te

(0.285) 0.904 (0.313) (0.675)

O^Pi ^l ,

ai9P i > 0

B (cc h P d

and where B(oth Pi) is the beta function. In general, we expect a f and P i to vary betw een treatm en t groups. The m ean and variance of this distribution are given respectively by:

\ a i + Pi

V (pf) = ----------------

,

l^i^/c

(a ; + P i) (a i + & + 1)

If the random variable X tj denotes the num ber of individuals respond­ ing o u t of nip then as show n above, conditional upon ph and litter

Table 6.1a Observed frequency distribution o f fo eta l death and fitted frequencies under the binomial model. Data taken from Haseman and Soares (1 9 7 6 ) - data set no. 1. Results from Pack ( 1986a) N u m b e r o f d e a d fo e tu se s ( x ) L itte r size ( n )

0

1

2

2

2

1

4 5 6

7 8

9

.

11

12

13 14 15 16 17

0.3

1.3

2

2

2

1

3.6

2.5

0.7

0 .1

6

1

4.3

3.3 3 2.3 4 3.8

8

18

-

1.3

8

9

10

11

12

13

1

-

0 .2

2.3

3.9 19 12.9 33 26.7 39 27.3 34 24.6 38 20.9 13 11.4

7

2 1.2

4.5

19

1

1.9

2

2

6

0 .1

1

2

2.6 10

0.7

2.5

2

5

0.3

1.7 3 2.3 5 4.8 2

4

0 .2

1.8

3

3

2

11

13.9 24 31.4 27 34.8 30 33.7 22

30.7 16 17.9 4 7.5 4 2.3

0.3

-

1.1

1

1

0 .2

1

0.9

0 .2

2

2

1.7 3

0.4 3

0 .1

6 .8

2.0

0.4 4

0 .1

1.2

0 .2

16.9

5 5.5

12

6

5

2

20.4 14 21.5 18

7.3

1.8

0.3

11

2 1.1

14 13.2 3 5.9

6

6

8.4 4 9.0 4

2.3

6 .0

3 2.9

4

-

0.4

2

1

2 .6

0 .6

3 1.9

1

0.5

2

1

1.0

0.3

2

1

1.9

1.0

0.4

0 .1

0 .1

2

1

0.5

0.9

0 .8

0.5

0 .2

0 .2

0.3

0.3

0 .2

0 .1

20

-

-

1

-

_

-

-

1

-

-

_

-

0 .1

0 .1

0 .1 -

1

1

Table 6.1b Observed frequency distribution o f foetal death and fitte d frequencies under the binomial model. Data taken from Haseman and Soares (1 9 7 6 ) ~ data set no. 3. Results from Pack ( 1986a) Number o f dead foetuses ( x ) Litter size ()n) 1 7 L JQ 4 5 £

0 7 8 9 10 11 12 13 14 15 16

1i 7/ 18

0 1 6.5 7/ 6.0 f.0 4.8 5 5.9 8 9.0

1

2

0.5

-

0.9

-

1.1 2 1.8 2 3.5

0.1 1 0.2 1 0.5

5.1 4 6.5 7 8.3 8 13.3 22 20.9 30 27.3 54 40.1 46 34.5 43 29.2 22 17.0 6 5.2

2.4 4 3.5 7 5.1 9 9.3 17 16.2 18 23.2 27 37.2 30 34.7 21 31.7 22 19.7 6 6.4

0.5 2 0.8 1 1.4 7 2.9 2 5.6 9 9.0 12 15.8 8 16.1 13 15.9 5 10.7 3 3.7

3 1.6

2.2

2 1.4

3

4

5

6

7

-

-

-

-

-

8

9

-

-

-

-

-

-

-

-

-

_

_

_

_

_

_

_

-

-

-

-

-

-

-

-

-

-

-

o o

1 0.1 0.2 1 0.5

1

-

-

-

-

-

-

-

-

_

_

_

_

_

_

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

1

-

1

-

-

-

-

-

-

-

-

-

1.3

1 0.1 1 0.2 2 0.3 1 0.7 1 0.9 1 1.0 1 0.8 1 0.3

1 0.6

0.2

-

1.2 1 2.1 2 4.1 4 4.6 3 4.9 2 3.6 -

1 -

-

2 0.1 1 0.1

-

-

-

0.2

-

1

1

1

-

1

-

-

-

-

1

-

-

-

-

-

-

-

-

-

0.1 1 0.1

-

-

-

-

-

-

-

-

-

-

-

-

_

_

I

I

-

-

-

-

O VER-D ISP ERSIO N

240

an d now after unconditioning w ith respect to the distribution of p t we have:

p r ( * u = x u\” y) =

0

P r ( * o = x u\nip P i ) f ( P i ) dPi

T he d istribution of equatio n (6.1) is term ed the beta-binom ial distribution. It has been em ployed in m odelling consum er purchasing behaviour (Chatfield and G o o d h a rt, 1970), in dental studies of caries in children (Weil, 1970), and in describing disease incidence in households (Griffiths, 1973). It was suggested for toxicological d a ta by W illiam s (1975). E arlier references are given by M cN ally (1990). An alternative p aram eterization is in term s of (ph 9t\ where

W e now have 0 < < 1 and 0f > 0. In this param eterization we have, from the above results, E l P i ] = Pi

V(pt) = rt( 1 -

i i t m i + e,)

W e see th a t as 0f 0, V (pt) 0 an d so the beta-binom ial distribution reduces to the binom ial form as the variance of pt decreases. This new p aram eterization is therefore preferable in term s of in terp reta­ tion of the param eters, and it is also m ore stable (Ross, 1975, 1990, and the discussion of section 2.2), resulting in m ore circular contours of the likelihood surface, which m ight facilitate iterative m axim um likelihood estim ation. In the rest of section 6.1 it is convenient to d ro p the i-suffix, denoting treatm en t group. In term s of the (ju, 0) param eterizatio n we can rew rite the betabinom ial probabilities of equatio n (6.1) as follows:

6.1

EXTRA-B IN OM IA L VARIATION

n

241

(M + rfl)

n

(1 - p + rO)

r= O

(6.2) where n r =o *s interpreted as unity (Exercise 6.1). W e can now see clearly th a t the stan d ard binom ial distribution results when 0 = 0. W ith this param eterization, the m ean and variance of X j are given by: E I X j] = rijH (6.3) A lthough we require the param eter 0 > O in the beta-binom ial distribution, the (n + 1) term s of eq uation (6.2) sum to unity for any real p and 0 (Exercise 6.2), although individual term s m ay lie outside the range [0 ,1 ]. In fact, if

then as pointed out by P rentice (1986), the term s of equation (6.2) form a valid p robability distribution, which we m ay call an extended beta-binom ial distribution, though it no longer results from a beta-m ixture of binom ial distributions, as before. This is a useful result, since it m eans th a t the binom ial distribution, when 0 = 0, does n o t occur at an endpoint to the allow able range for 0, and consequently it is possible to perform a valid likelihood-ratio test of the null hypothesis th a t 0 = 0. W e shall see exam ples of this later. W hen 0 < O then the extended beta-binom ial m odel in fact is under-dispersed, in the sense th a t the V (X J«) is now less th an the corresponding binom ial variance. However, in practice, when we m ight expect to encounter ‘large’ values for np there will be little potential for m odelling under-dispersal with this model, due to the above b o u n d on 0. An interesting feature of the beta-binom ial m odel is th at it induces a correlation, p, betw een the responses of individuals in the same

O VER-D ISP ERSIO N

242

litter. This feature was first appreciated by A ltham (1978) and it is n o t difficult to show th a t p = 9/(1 + 0) (Exercise 6.3). 6.1.2

Fitting the beta-binomial model

F o r simplicity we shall continue with the case of ju st one treatm ent group. U n d er the beta-binom ial m odel we have the following expression for the log-likelihood:

m r*/-l 0) = £

\ £

rtj-Xj-1 log(/i + rd) +

7=1 I r = 0

'j

nj-1

r=0

lo g (l - /i + rd)

X r=0

— Y, (1 + r0) > + co n stan t term J

(6.4)

F irst an d second derivatives with respect to p and 6

-1 i

-1

m (Xj

nj-Xj

f = Z I r U dn j = i ( , r =o (n + rO) dl__ « p y 1 86

r

Z

r=o

J=i | r = 0 (n + r 6 ) +

r —o

+

rd)

r

r

(1 ~ n + r6)

» 1i "y1 i - nj~zxj[ r=0(n+ r6)2 r=0 ” 1 V r2 -nj-xj-i 11 r=oifi+rd)2 rz=0 r +nj-xzj_ Ji[ V 1 r=o(n+ r6)2 r=0

82l dn2~ h \ 82l w~h d21 f 8nd6~h

i

(1 - n

| "J~ y

are

,%

r=o(l+r0)

1 (1 ~ n + r6)2

r2

nj-1 r2 +

z

(1 - p + r0)2

M axim um -likelihood estim ates therefore require num erical iteration, as provided, for exam ple, by the N e w to n -R a p h so n m ethod. An efficient F O R T R A N algorithm is given by Sm ith (1983), while P ack (1986a) used the N A G (1982) library routine E04LA F, which em ploys a m odified N ew to n -R a p h so n algorithm b u t also copes w ith con­ straints on param eters; this is relevant here in view of the bounds on 6. P au l (1982, 1985) used the N e ld e r-M e a d simplex m ethod, as did Segreti and M unson (1981), th o u g h in the latter case they were m odelling dose-response data, and set p to be a logistic function of dose. W e shall consider their m odel in detail later. T he m ethod-of-

6.1

243

EXTRA-B IN OM IA L VARIATION

m om ents estim ates are given below (K leinm an, 1973; Exercises 6.4 and 6.15) x P=~

n.

e=

m

J=l

^

m

I

/i(l-/i)lK-l) j=i

(6.5)

Yl-

1-^

n.

V

where x #= i and nm= Y%= i nr The need to use specialist program s is avoided if the approxim ate procedure proposed by B rooks (1984) is adopted, since this is based on G L IM . F o r further discussion, see Exercise 6.6 and section 6.4. C hatfield and G o o d h a rt (1970) fitted the beta-binom ial m odel by the m ethod of m ean and zero-frequency, which was discussed also in section 5.3.4 (Exercise 6.5). As an illustration we give, in Table 6.2, the point estim ates from fitting binom ial and beta-binom ial models to the d a ta of Tables 1.10a and 1.10b, using m axim um -likelihood. It is clear th a t for these d a ta it is far too restrictive to set 0 = 0, and, as suggested earlier, extra-binom ial v ariation needs to be incorporated in the model. The fitted values from the beta-binom ial Table 6.2 fits

to

th e

(1 9 7 6 ).

A

su m m a ry o f the m a x im u m -lik elih o o d

d a ta

T h ese

o f T a b le re su lts

w ere

P

1 .1 0 b ,

o b ta in ed

b y

—M

a x .

lo g . lik.

0.089 (0.003)a 0.072 765.06 (0.003)

b in o m ia l a n d b eta -b in o m ia l

ta k en fr o m

P a ck

H asem an

and

S oares

( 1 9 8 6 a )

B eta -b in o m ia l fit

842.61 1.10b

and

B in o m ia l f it

T a b le

1.10a

1 .1 0 a

P

0.090 (0.005) [0.005]b 0.074 (0.004) [0.005]

0

0.073 (0.011) [0.015] 0.081 (0.012) [0.019]

—M

a x .

lo g . lik.

1 1 1 .1 9

701.33

“Estimates in parentheses: ( ) provide asymptotic estimates of standard error, from the inverse Hessian evaluated at the maximum likelihood estimate. b Estimate in brackets: [ ] provide estimates of standard error based on 1000 simulations: simulated data were obtained using the maximum likelihood estimates of the parameters and by matching the observed litter sizes.

Table 6.3a Observed frequency distribution o f fo eta l death and fitte d frequencies under the beta-binomial model. Data taken from Haseman and Soares (1 9 7 6 ) - data set no. 1. Results from Pack ( 1986a) N u m b e r o f d ea d fo e tu ses ( x ) L

i t t e r

size(n )

----------------------------------------------------------------------------------------------------0 1 2 3 4 5 6 7 8 9 10 11 12 13

1

2

2

2

3 4

1.8

1.7 3 2.3 5 5.0

-

0.2 0.3 0 .6

1

1.6

0.4

1.0

0.3

0 .1

0 .1

5

2 2.6

1.0

6

2

2

7 8

9

2

2.5

1.0

0.4

2

2

2

1

4.1

1.9

0.7

0 .2

6

1

4.9

2.5 3 1.7 4

2

3.1 10

11

12 13 14 15 16 17

2

4.9 19 16.6 33 35.9 39 38.3 34 36.2 38 32.3 13 18.5 8

7.7 18 19

0 .1

1

-

2.8 11

10.8 24 22.6 27 25.1 30 24.5 22 22.6 16 13.3 4 5.6 4

-

1.1

0 .1

1

1

0.4

0 .1

0.3

0 .1

1 0 .8 2

1.4 3 5.2 11 12.4 12 14.4 14 14.6 18 13.9 14 8.4 3 3.7

-

2 0 .2

0 .1

0.4

7.6

1.1 4 2.9 5 3.7

6

6

8.0 4 7.9 4 5.0 3 2.2

0 .6

3 2.5 5 6.2 6

-

-

-

-

-

-

-

1.7

0.2 _ _ _ _ _ _ 0.5 0.2 0.1 - 1 - 0.7 0.3 0.1 -

4.2 2 4.3 3 2.8 2 1.3

2.0

0.9 0.4

0.7

0.8 0.4 1 0.4 0.2

4

1.2 2

1

-

2.2

1.1 0.5

1

-

1.5 1

2

1

1.2

0.7

0.4

0.3

0 .1

0 .1

0 .1

2.3

1.8

2

1

1.0

0.7

0.5

0.3

0 .2

0.3

0 .2

0 .2

0 .1

0 .1

20

-

-

0.2 0.1 -

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

0.2 0.1 0.1

-

-

0.2 0.1

i _ - -

-

-

-

Table 6.3b Observed frequency distribution o f foetal death and fitte d frequencies under the beta-binomial model. Data taken from Haseman and Soares (1 9 7 6 ) - data set no. 3. Results from Pack ( 1986a) Number o f dead foetuses ( x ) LT-jlittnr LLtZr size (n ) 1 r> L J 4 5 0 7 8 9 10 11 12 13 14 15 16 11 7/

18

0 7 6.5 7/ 6.0 0c

1

2

3

4

5

6

7

8

9

0.5

-

-

-

-

-

-

-

-

0.9

0.1

-

-

-

-

-

-

-

4.8 5 6.1 8 9.3

1.0 2 1.5 2 2.8

0.2 1 0.3 1 0.7

_

_

_

_

_

_

_

-

0.1

-

-

1

1

-

-

-

-

0.2

-

-

-

-

-

-

5.4 4 7.1 7 9.2 8 15.3 22 24.8 30 33.5 54 50.9 46 45.5 43 40.0 22 24.2 6 7.6

1.8 4 2.6 7 3.7 9 6.5 17 11.0 18 15.7 27 24.8 30 23.0 21 20.9 22 13.0 6 4.2

0.6 2 0.9 1 1.4 7 2.7 2 4.9 9 7.3 12 12.2 8 11.7 13 11.1 5 7.1 3 2.4

_

_

_

_

_

-

-

-

-

1

1

-

3 2.5

1.5

oo

_

2 0.9

0.2 1 0.3

0.1 -

0.5 1 1.0

1.3

0.2 1 0.4 1 0.8 2 1.4 1 2.6 1 2.8 1 2.9 1 2.0 1 0.7

1 0.5

0.3

-

2.1 1 3.3 2 5.8 4 5.9 3 5.7 2 3.8 -

0.1 -

0.3 0.6 -

0.1 1 0.2 2 0.4

-

-

1 0.1

-

0.1

-

-

1.4

0.6

0.2 1 0.2 1 0.3

0.5

0.2

0.1

-

-

-

-

0.2

0.1

-

-

1.1 1 1.3

-

0.5

-

1.0 1 0.4

-

0.1 -

0.1

-

1

_

_

_

_

_

0.2

0.1

-

-

-

246

O V ER-D ISP ERSIO N

fits are given in Tables 6.3a and 6.3b. The results of P ac k ’s (1986a) M onte C arlo tests of goodness-of-fit of the beta binom ial m odel are given below: Values o f maximum log-likelihood fo r beta-binomial model

Table

Actual data

Range o f values from 99 matched simulations

1.10a 1.10b

-7 7 7 .7 9 -7 0 1 .3 3

( - 8 1 5 .2 5 ,- 7 3 0 .0 7 ) (-7 4 6 .5 5 ,- 6 3 4 .4 2 )

Rank o f max., log-likelihood from actual data in the range o f 100 values available 33 46

So, on the basis of these results alone, we m ay conclude th a t the betabinom ial m odel is providing a reasonable description of the data. W e do note, however, the possibility of a n um ber of extrem e, or outlying litters exhibiting high-m ortality, and this is a point to which we shall retu rn in section 6.6.2. T here is further discussion of the betabinom ial fit, of M onte C arlo tests an d add itional examples, in Exercise 6.7 and its solution. H asem an an d K u p p er (1979) to o k ran d o m sam ples of pairs of 20 litters from the d a ta of Tables 1.10a an d 1.10b and com pared the pairs for differences using likelihood ratio tests based on the betabinom ial distribution. They concluded th a t the Type I errors were inflated. P ack (1986a) repeated their com parison, b u t for forty litters in each case, and also found inflated Type I errors. H e did n o t feel th a t this finding was sufficient to reject the b eta-binom ial as a suitable m odel for these data.

6.1.3

Tar one's test

An alternative test to the likelihood-ratio test of the hypothesis th a t 0 = 0 was p roposed by T aro n e (1979). This was a C(a) test (N eym an, 1959), which does n o t require the fitting of the beta-binom ial model. It is the locally m ost pow erful test of 0 = 0, and is asym ptotically optim al against beta-binom ial alternatives. It is in fact a score test (Pack 1986; Prentice, 1986; Exercise 6.10 and A ppendix D) and

6.1

247

EXTRA-BIN OM IA L VARIATION

the test statistic, which is asym ptotically norm ally distributed, is given by:

where p = x j n Basing his ap p ro ach on the ten litter treatm ent group in K upper and H asem an (1978), T arone conducted a small M onte C arlo study of the size of the test, and found it to be conservative. 6.1.4

The possibility o f bias

In a p aper pro m p ted by the sim ulation w ork of K upper et al. (1986), W illiams (1988c) presented the results of Table 6.4. We can see th at unless p = 0, corresponding to ignoring over-dispersion, or p is set approxim ately equal to its true value, appreciable bias m ay result in the estim ate of p. This result is discussed further in Exercise 6.11. W hile this finding m ay ap p ear slightly strange in com parison with linear modelling, when estim ates of m ean value rem ain unbiased if variances are mis-specified, w hat is their relevance here? We w ould Table 6.4 (W illiam s, 1988c) The effect o f estim at­ ing p in the beta-binomial model when the correlation parameter p = 0 / ( l + Q ) is fixed at an assumed value. Results o f 1000 simulations: sample mean values fo r p, with sample standard errors in parentheses True parameter values Assumed value of P

0.00 0.05

0.10 0.20 0.30 0.40 0.50

p = 0.125 p = 0.200

p = 0.250 p = 0.333

0.126(0.001) 0.112(0.001) 0.113(0.001) 0.127(0.001) 0.146(0.001) 0.167(0.001) 0.191(0.001)

0.253(0.002) 0.231(0.002) 0.227(0.002) 0.237(0.002) 0.249(0.002) 0.269(0.002) 0.290(0.002)

O V ER -D ISP ER SIO N

248

n o t norm ally contem plate fixing p in ord er to estim ate p. The answ er lies in the hidden dangers this effect m ay reveal w hen the betabinom ial distrib u tio n is used for m aking comparisons between treatm ents. In such cases it m ay seem attractive to m odel p or 9 as a function of dose level, say, or of m ean response (e.g. M oore, 1987). M is-specifications in such cases m ight then result in biased com pari­ sons. Fears of problem s of this n atu re lead naturally to the consideration of m ore ro b u st procedures as the basis of inference, and in p articu lar to the technique of quasi-likelihood. These are m atters to which we shall retu rn later in the chapter, and which lead naturally into the following section. W illiam s (1988a) has rep o rted serious biases resulting in the estim ation of 9 w hen the beta-binom ial m odel is fitted to d ata sim ulated from the logistic-norm al-binom ial distribution, and m ixtures of binom ial d istributions (see sections 6.6.3 an d 6.6.4 for relevant discussion of these tw o cases). T here is m ore to be said on the subject of m odelling extra-binom ial variation and we shall continue the discussion in sections 6.5 and 6.6.

6.2 Making comparisions in the presence o f extra-binomial variation 6.2.1

An example involving treatment versus control

W e shall now retu rn to the exam ple of Exercise 1.6, involving te ra­ tology data. It appears from the con tro l gro u p th a t a binom ial distri­ b u tio n m ight suffice to describe the variation in the data. H owever, the p ro p o rtio n s surviving in the treated group range from 0% to 100%, an d it seems unlikely th a t a binom ial distribution will suffice to describe these data. The d a ta were analysed by W illiam s (1975), assum ing a beta-binom ial m odel. The m axim um -likelihood results are given below. H ere subscripts of C an d T refer to control and treated groups respectively, and estim ated asym ptotic stan d ard errors are given in parentheses: fic = 0.898(0.026), 9C = 0.021 (0.048); m ax lc = — 51.69 p T = 0.740(0.069), 6 T = 0.465(0.234); m ax lT = — 64.99

6.2

M A K IN G C O M PA RISIO NS EXTRA-BIN OMIA L VA RIA TION

249

W hen the binom ial m odel is fitted separately to each group the values of the m axim um log-likelihood are: m ax lc = — 51.80; m ax lT = — 77.77 As anticipated, therefore, for the control group there is virtually no im provem ent in fit from including the betw een-litter variation, but there is a significant im provem ent for the treated group. W hen a single beta-binom ial d istribution is fitted to both the control and treated groups sim ultaneously we obtain: fi = 0.818 (0.038), 6 = 0.271 (0.117); m ax 1= - 1 2 0 .5 4 C onsequently a likelihood ratio test of the hypothesis:

H i ' He ^ /*Tj

^ @T

results in the likelihood-ratio (LR) statistic: 2(120.54 - 51.69 - 64.99) = 7.72 which is significant at the 2.5% level when referred to x l fables. We shall refer to this test as H I. Several other tests are possible, in each case with LR statistics referred to x \ tables: LR statistic: 2.07

LR statistic: 1.95

LR statistic: 5.65

LR statistic: 5.77 The H I - H 5 n o tatio n is due to A eschbacher et al. (1977). We see from H 4 th at there is significant evidence th at 6C ^ dT, and then from H 5 th a t there is significant evidence th at jic ^ fiT. We conclude th at the treatm ent has affected b o th the variation and the m ean response.

O VER-D ISP ERSIO N

250

There is m ore discussion of this exam ple in Exercises 6.12-6.14. N o te also the com parisons which m ay be draw n w ith the approach detailed in the solution to Exercise 1.6. In th a t case a non-significant result was obtained. See also Exercise 6.16. The com parison of this section is readily extended to the case of com paring k treatm ents. If the ith treatm ent corresponds to beta-binom ial param eters, (ph 9*), 1 ^ i ^ k, then the log-likelihood is given by: /=

lo g (l —^ + r 0 £)

(6.6) It is possible th a t treatm en t differences m ay correspond to differences betw een the {jU*}, b u t n o t betw een the (0 J , and we shall encounter exam ples of this in sections 6.2.2 an d 6.6.5. H owever, incorrect specification of the {0f} (e.g. 0f = 0, for all i) m ay give rise to errors in estim ation of treatm en t differences, due to the possibility of bias m entioned in section 6.1.4. As pointed o u t by W illiam s (1988c), if one h ad assum ed 9C = 0 T for the exam ple of this section, then the estim ate of the difference: (pT — fic) w ould be reduced from 0.158 to 0.098. 6.2.2

Comparing alternative test procedures

In this section we continue the one treatm en t versus control com parison of the last section. T here are m any ad hoc ways of com paring con tro l and treatm en t groups. F o r instance, as discussed already in the solution to Exercise 1.6, in each group we m ay form p ro p o rtio n s responding in each litter, and then com pare groups by m eans of a M a n n -W h itn e y U test applied to these proportions. If we prefer to use a t-test, rath er th an a non-p aram etric test then we m ight well first of all transform the p ro p o rtio n s by m eans of the variance-stabilizing arcsine sq u are-ro o t transform ation; see also Brooks (1983) and C h an ter (1975). A p o p u lar m odification of this tran sfo rm atio n which has seen m uch use in teratology is the ‘F re e m a n -T u k e y B inom ial’ transform ation, given by:

6.2

M A K IN G C O M PA RISIO NS EX TRA-BIN OM IA L VA RIATION

251

corresponding to x anim als responding out of n (Bishop et al., 1975, p. 367). It is interesting to consider which appro ach to use in practice. If d a ta do in fact conform to the beta-binom ial model, we m ight expect likelihood-ratio procedures based on the beta-binom ial m odel to be m ore powerful th an sim pler approaches. However, this expectation would inevitably also be tem pered by the realization th a t the likeli­ h o o d -ratio tests are asym ptotic, and so m ay n o t perform particularly well for small samples. A num ber of au th o rs have been interested in the sm all-sam ple properties of likelihood-ratio tests and in com paring alternative test procedures (H asem an and K upper, 1979; V uataz and Sotek, 1978; G laden, 1979; H asem an and Soares, 1976; Shirley and Hickling, 1981; Pack, 1986b). C om parisons for small samples need to be based on sim ulation studies, and the picture is com plicated by the range of possible tests, H I - i f 5 above, to be considered. W hile some of the early w ork in this area did n o t find likelihood-ratio m ethods to be m ore pow erful th an tests using simple transform ations of p roportions, P ack (1986b) has argued th a t in various cases this w ork did n o t m atch sufficiently well the kind of configurations th at are m ost likely to be encountered in practice. His conclusions are, fortunately, quite simple to state, though inevitably they are qualified by the p articu lar configurations of his sim ulations. Being com pared were likelihood-ratio tests of the three com parisons: H I , i f 2 and H 5, the t-test following a F reem an -T u k ey Binom ial transform ation, and a M est based on a weighted estim ator due to K leinm an (1973). This last test has been show n by Pack (1986a) to be equivalent to a test resulting from a quasi-likelihood app roach (Exercise 6.15). P ack (1986b) concluded th a t if treatm ent effects are small (for example, p T — \ic ^ 0.2) then no test has a clear advantage. Otherw ise, however, under the assum ed beta-binom ial model, a likelihood-ratio test of H 5 (which places no restrictions on the 9 param eters) was recom m ended, as it h ad acceptable error-rates, and was the m ost powerful of the likelihood-ratio tests in a wide range of instances. By co n trast the tw o t-tests were found to be appreciably less powerful in m any cases. F ro m a com pu tatio nal point of view, the W ald test (Appendix D) is preferable to the likelihood-ratio test for H 5, and has equivalent operating characteristics. As has been pointed o u t by W illiam s (1988c), the greater pow er of the likelihoodratio test of H 5, when com pared w ith th a t of H 2 (which is when

O V ER-D ISP ERSIO N

252

the constrain t of 9 1 = 62 is im posed), can be partially explained in term s of the bias in estim ation of fi which m ay arise w hen 6 is mis-specified (section 6.1.4). The conclusion th a t a p aram etric procedure is best if the assum ed m odel holds should com e as no great surprise. Also unsurprising is the greater sensitivity of this procedure to departures from assum ptions. The likelihood-ratio tests have been found to have inflated erro r rates when applied to real d ata, an d they m ay then lose pow er superiority in com parison w ith the m ore ro b u st and sim pler t-tests (Pack 1986a). A study to investigate this com parison fully is long overdue. An im p o rtan t com parison betw een treatm ents occurs when different treatm ents correspond to different doses of some substance, and this is the topic of the next section. W e now therefore retu rn to the m ain dose-response topic of this book, having prepared the ground, in sections 6.1 and 6.2 for dealing w ith extra-binom ial variation. 6.3

Dose-response with extra-binomial variation

The d a ta of Table 6.5 result from an investigation into neonatal acute toxicity to trichlorom ethane, a com m on con tam in an t of drinking water. The doses were adm inistered (by oral gavage) to Table 6.5 N eonatal acute toxicity to trichloromethane. Data taken from Segreti and Munson (1981). There is no explanation o f the common litter size ( 8 ) in all cases, but this is likely to result from using litters o f size ^ 8 and then, as a consequence o f experimental constraints, taking a sample o f 8 mice when necessary. Five litters are exposed to each treatment and they are arranged below in order, from left to right, o f increasing response Dosage ( mg/ kg) Control 250 300 350 400 450 500

Number dead per litter 0 0 0 0 1 1 1

0 0 0 2 2 4 7

0 1 0 2 4 5 8

2 3 1 5 6 6 8

T otal dead 2 6 8 8 7 8 8

4 10 9 17 20 24 32

6.3

253

DO SE -R E SP O N SE WITH EXTRA-BIN OM IA L VA RIATION

mice seven days after birth, and toxicity was recorded by the num ber dead w ithin 14 days of treatm ent. The v ariation in response between litters w ithin doses is clearly greater th an could be explained by simple binom ial response, and we shall now consider how to inco rp o rate extra-binom ial variation into the dose-response fram ew ork, with p articular reference to the results of Table 6.5. 6.3.1

The basic beta-binomial model

As in section 6.1.1, we assum e th at for the ith treatm ent, pt has a betadistribution with p aram eters ^ and 0f. T here is control m ortality present in Table 6.5, and we shall use A b b o tt’s form ula to m odel the control m o rtality (section 3.2). W e also take, as the simplest possibility, a logistic dose-response relationship to describe the dosedependence of Thus, following Segreti and M unson (1981), we set: Pi = X + (1 — 2)/(l + e _ (a+/*2«)) where z t is log (dosage). The regression of ^ on continuous variables was first suggested by C row der (1978) in a context to be described in section 6.6.5. W e know from the w ork of C h ap ter 4 th a t m any other possibilities m ay be considered as alternatives to, and possibly in preference to, the logistic link function (Exercise 6.17), and we retu rn to this point in section 6.5. W hen this m odel is fitted to the d a ta of T able 6.5, using the N e ld e r-M e a d simplex m ethod, we o btain the m axim um -likelihood estim ates and estim ated asym ptotic correlation m atrix given below. Estim ated asym ptotic stan d ard errors are given on the diagonal. Estimated errors ( in parentheses) and correlations Parameter

Estimate

2 a

0.161 36.810 -1 4 .0 3 0 0.681

p

e

2 a P §

2

a

(0.087) - 0 .5 3 4 -0 .5 2 4 0.150

(18.222) -0 .9 9 9 -0 .0 9 5

P

-

9 -

(6.891) 0.098

-

(0.230)

It was found necessary to centre the d a ta (apart from the control case) by subtracting the m ean log dosage from each log dosage (see here

254

O VER-D ISPERSIO N

the discussion of section 2.1). In all, five m odels were fitted, and the details are show n below.

Model { p f , dl = 0 i 0i} In-}, di = d Hi = H , 0i = 0 a, P, d, X

— M ax. log likelihood

Number o f parameters in the model

158.76 122.43 125.76 133.84 126.43

7 14 8 2 4

W e m ay conclude th a t there is significant over-dispersion (2 x (158.76 — 122.43) = 72.66, referred to %*), th a t we m ay take = 6 for all /, since 2 x (125.76 — 122.43) = 6.66, referred to xl> an(^ th a t the logit m odel provides a satisfactory fit (2 x (126.43 — 125.76) = 1.34, referred to %*). The L D 50 is estim ated by 10-r(XJ = x j \nj) =

p*'(l - p r - x>

2p(l — p) for 0 ^ p ^ 1,

0 < Xj; ^ np

1 ^ j ^ m,

where p is the co rrelation betw een the responses of any two litterm ates. The m ean and variance are given by: E [A j|n ;] = rijp V(Xj\nj) = n j p ( l - p ) { l + p(Hj — 1)}

6.6

A D D IT IO N A L M OD ELS

267

Hence we have the same m ean/variance relationship as with the betabinom ial model, and the same fit w ould result from a quasi-likelihood approach. W e see im m ediately from the expression for the variance th a t we m ust have p ^ — 1/(rij — 1),

for all j for which rij > 1

and so the largest litter provides the m ost restrictive bou n d on p. In fact, as K u p p er and H asem an (1978) have shown, there are additional bounds on p, given by: 2 . ( P ----------------mm ftjitij - 1) \\ -P for all j, where y0 = m in [

^ 2p(l — p) , --- I sCp s=-------------------------------------- , P J (rij - l)p(l - p ) + 0.25 - y0 (n;—l ) p - 0 . 5 } 2].

Xj

We reproduce below a table from K up p er and H asem an (1978) which quantifies these bounds for a range of values for rij and p.

P nJ

0.10

0.30

0.50

2 3 5 7 10 15 20

(-0 .1 1 1 ,1 .0 0 0 ) (-0 .0 3 7 ,0 .5 2 9 ) (-0 .0 1 1 ,0 .3 0 0 ) ( —0.005,0.231) (-0 .0 0 2 ,0 .2 0 0 ) (-0 .0 0 1 ,0 .1 2 0 ) (-0 .0 0 1 ,0 .1 0 0 )

(-0 .4 2 9 ,1 .0 0 0 ) (-0 .1 4 3 ,0 .6 3 6 ) (-0 .0 4 3 ,0 .4 2 0 ) (-0 .0 2 0 ,0 .2 9 6 ) (-0 .0 1 0 ,0 .2 0 0 ) (-0 .0 0 4 ,0 .1 3 5 ) (-0 .0 0 2 ,0 .1 0 0 )

(-1 .0 0 0 ,1 .0 0 0 ) (-0 .3 3 3 ,1 .0 0 0 ) (-0 .1 0 0 ,0 .5 0 0 ) (-0 .0 4 8 ,0 .3 3 3 ) (-0 .0 2 2 ,0 .2 0 0 ) (-0 .0 1 0 ,0 .1 4 3 ) (-0 .0 0 5 ,0 .1 0 0 )

In m any practical exam ples we m ight anticipate a com bination of ‘large’ {rij} and ‘sm all’ p, in which case the correlation p is prevented from taking negative values of p appreciably different from 0. C onsequently the correlated binom ial m odel does not offer a real alternative to the beta-binom ial, especially since m odel-fitting by m axim um -likelihood is m ore com plex due to the data-dependent bounds on p. Pack (1986a) used the simplex m ethod for m odel-fitting by m axim um -likelihood, setting the likelihood to a very small value

O V ER -D ISP ER SIO N

268

Table 6.6 Parameter estim ates fo r the maximum-likelihood correlatedbinomial fit to the data o f Table 1.10a and 1.10b, taken from Haseman and Soares (1976). Results are taken from Pack ( 1986a) Table

Correlated-binomial fit P

P

M ax. log likelihood

1.10a

0.0930 (0.0044)° [0.0057]*

0.0436 (0.0058) [0.0064]

801.68

1.10b

0.0761 (0.0042) [0.0063]

0.0389 (0.0062) [0.0080]

732.19

"Estimates in parentheses: ( ) provide asymptotic estimates of standard error, from the inverse Hessian evaluated at the maximum likelihood estimate. b Estimates in brackets: [ ] provide estimates of standard error based on 1000 simulations: simulated data were obtained based on the observed litter sizes and the maximum likelihood estimates of the parameters.

w henever the simplex strayed into an inadm issible region. The results in T able 6.6 provide the p aram eter estim ates from the m axim um likelihood fit to the d a ta of T ables 1.10a and 1.10b; cf. Table 6.2. P au l (1985) and Pack (1986a) also investigate a beta-correlatedbinom ial m odel which includes b o th the beta-binom ial and correlated binom ial m odels as special cases (Exercise 6.24). W e continue discussion of the correlated-binom ial m odel in section 6.6.3.

6.6.2

M ixtures o f binomials; outliers and influence

The d a ta of Tables 1.10a an d 1.10b exhibit a small num ber of litters w ith high m ortality. It is n a tu ra l to w onder to w hat extent the presence of such litters m ight influence the fit of m odels such as the beta-binom ial and correlated-binom ial. It is unusual to encounter studies as large as these, which are the result of pooling inform ation from a n um ber of experim ents. O ne m ay also, therefore, question w hether there m ay be present a sm all num b er of litters with a b n o r­ m ally high response rates, which m ay even be responsible for a beta-binom ial m odel (say) being used in preference to an inadequate binom ial model. A m ixture of tw o binom ials was included in the

6.6

269

A D D IT IO N A L M ODELS

study by Sm ith and Jam es (1984), which involved fitting the m ixture, the beta-binom ial and the correlated-binom ial m odels to 48 sets of d a ta from d o m inant-lethal assays. Tw o- and three-com ponent m ixture m odels have been sim ulated by W illiams (1988a) in order to investigate the perform ance of the beta-binom ial m odel when fitted to such data. A general c-com ponent m ixture has probability function given by: P r( X j = x j\n j)= £ r=1

\ x j/

0 ^ x }, ^ n,-, 1 < ; < m

where

0 ^ pr < 1,

U

r a

k

£ y r =l, r=

Ur 0.5, and the W ilcoxon test, for which S{u) = (u — |). An estim ator of the centre of sym m etry, 0, of F(x) can be obtained by choosing 0 so th a t the sam ples, ( X l — 09. . . , X m — O) and ( —( X x — 0 ) ,..., —{ X m — 0)), when subjected to the above test, result in a test-statistic of zero. T hus we seek a ro o t in 0 to the equation, 0 = i £ SH n t i =i \

F^

+ 1 - F ^ 2e - X ‘V 2m + 1

F M + 1 - K ( 2« - X ) y f [ x )

(713)

Three ro b u st estim ators which result are the sam ple m edian, from using the sign test score function, the H o d g es-L eh m an n estim ator, from using the W ilcoxon test score function, and the norm al scores estim ator, which results when S(u) = Q>~1(u). O nce eq u atio n (7.13) is w ritten in term s of an em pirical distribution function then, once again, the way is open for use of the sam e ideas in the q u an tal response case. Jam es et al. (1984) show th a t under certain regularity conditions the resulting estim ator is asym ptotically norm ally distributed, w ith variance, (S'(F(x)))2/ 2(x)F(x){l - F(x)}dx d U

— ( l

(7.14)

S ' ( F ( x ) ) f 2{x)dx^j

and consider in detail tw o cases, the H o d g e s-L eh m an n case, and w hat they call the logistic scores estim ator, when S(u) = log{w/(l — u)} It is show n th a t the expression of eq u atio n (7.14) is m inim ized w hen S(u) is of this form (Exercise 7.15), b u t unfortunately this S(u) violates

7.5

323

RO BUSTNESS A N D EFFICIENCY

the required regularity conditions. H ow ever, sim ulation results indicate th a t it enjoys good properties (Exercise 7.16). A dditionally, it is possible to approxim ate closely to S(u) by functions which satisfy the regularity conditions. C om putationally, it is necessary to solve the analogous analysis of eq u atio n (7.13) for the q u antal assay case, and the num erical procedure for doing this is described by Jam es et al. (1984).

7.5.3

Influence curve robustness

The robustness of estim ates m ay also be discussed through consideration of their influence curves. In the case of a direct random sample, { X u . . . 9X m} from a distribution with cum ulative distribution function F(x), the influence curve of an estim ator denoted by T ( F \ where T is a functional on the space of distribution functions, is defined by H am pel (1974) as ] m [ T { { l - e ) F + e8x} - T ( F ) - ] 4>t A * ) =

--------------------------------------£

(7-15)

where Sx is the distrib u tio n of unit m ass at e, if the above limit exists for all real x. T hus in equation (7.15) we consider the limit of the norm alized effect on the estim ator of an additional observation added at x. Subject to suitable regularity conditions, $ r>F(x) is the limit, as m -» oo, of T ukey’s (1970) sensitivity curve defined by: ( 7 7 _ p + — 1— s \ _ T (P) W l (rn+ 1) 7 m(x) — t

c

(7.16)

(m + 1) (See also Andrews et al., 1972, p. 96.) F o r an illustration, see Exercise 7.18. The influence curve for the q u an tal response case is defined by Jam es and Jam es (1983), by analogy w ith equ ation (7.16), but based on F(x) defined in the last section. W e shall assum e a dose m esh D, with spacing A between doses, and n individuals treated at each dose. An im p o rtan t difference betw een the w ork of this section and th a t of section 3.4 on influence, is th a t here we concentrate on the effect

324

N O N -P A R A M E T R IC A N D R O BUST M ETH O D S

of adding a single binary response at any dose d. In section 3.4 we considered the effect of deleting b o th a dose level, and all of the inform ation associated w ith th a t dose. In the q u an ta l response case, we focus on the E D 50, an d investigate the change in value of the estim ato r of the E D 50 due to an ad d itio n al response y(y = 0 or y = 1) at dose level d. The definition of Jam es an d Jam es (1983) is in two parts: 1. First of all, suppose the dose mesh, D, is fixed. D enote by FniFjo) the em pirical distrib u tio n functions w hen we have one additional positive (negative) response a t dj. T he influence curve of the estim ato r of the E D 50 based on D, the function T and underlying tolerance distrib u tio n cum ulative distribution function F is given by I C T,F,D(dP y )

= lim y { T( F Jy) - T(F)}, alm ost surely, n-> oo A

(7.17)

if the lim it exists, for y = 0,1, and 1 ^ j ^ k. By analogy with the sensitivity curve for a direct ran d o m sam ple, I C TtFtD(dj9y) is a function of the observed sam ple, an d is a random variable for fixed y and dj. H ow ever, it is only defined for a fixed dose mesh. T he tran sitio n to a function defined for all values of d is accom plished by letting the dose m esh becom e dense, as A -* 0, dk -* oo, an d d x - > — oo. W e can then define: 2. The influence curve of T, under F , is given by: i c t ,f(^? y ) =

A-0

i

y)

if the lim it exists, for y = 0,1, an d real d, w ith the convention th at the lim it is taken over those dose m eshes D th a t contain d as a point. As noted in section 7.5.1, the Fisher inform ation in a fine dose m esh is p ro p o rtio n al to n/A, which accounts for the presence of this term in eq u atio n (7.17). In the direct sam pling case, an estim ato r m ay be regarded as ro b u st if its influence curve is bounded. In the q u an tal response application, Jam es an d Jam es (1983) define T as influence-curve ro b u st a t F if b o th

7.5

325

RO BUSTNESS A N D EFFICIENCY

F o r L-estim ators, the influence curve is given by: I C TtF(d, y) = {F(d) — y}J(F(d)) where J{u) is the weight function of section 7.5.2, subject to suitable regularity conditions (Jam es and Jam es, 1983). W e can then dem onstrate th a t the S p e a rm a n -K a rb e r estim ator is n o t robust, but the a-trim m ed version is (Exercise 7.19). F o r M -estim ators, the influence curve is: I C T9F(d,y) = ( F ( d ) - y )

il/'(d - 9) * K ’ — 6)dF(x)

where 9 denotes the E D 50. F o r K -estim ators, the influence curve is: , c T, M , y ) = { m ~ m F m m J'(F(x))f{x)dF(x) 00 where f ( x ) = F (x ), an d J(u) is now the score function of section 7.5.2, subject to further regularity conditions. It appears th a t the T ukey biw eight and H o d g es-L eh m an n estim ators (for stan d ard threshold distributions) are influence-curve robust, b u t th a t the logistic scores estim ator is generally not. Jam es and Jam es (1983) provide a m odified form of the logistic scores estim ator which is influence-curve robust. See Exercise 7.20 for further discussion. The m axim um -likelihood estim ator of 9 under the logit m odel is also n o t ro b u st (Exercise 7.21). 7.5.4

Efficiency comparisons

In Table 7.4 we present a range of asym ptotic efficiencies, derived from the papers of M iller and H alp ern (1980) and Jam es et al. (1984). The contam in ated d istributions were 95%, 5% m ixtures of the original d istributions w ith d istributions of the same form and m ean, b u t a stan d ard deviation 10 tim es larger. The slash d istribution is the distribution of a unit norm al random variable divided by an independent ran d o m variable uniform ly distributed over the [0 ,1 ] range. Like the C auchy distribution, the slash is heavy tailed. By contrast, the angular distrib u tio n is short-tailed, with

N orm al Contam inated normal Logistic C ontam inated logistic Cauchy Slash Angular

Tolerance

88

84 88

58

75 0 0

81

100

75 85 80

68

95 90 96 74 80

86

Trimmed Spearm an -K arber (a = 0.10) (o l = 0.05)

98 75

Spearm an -K arber ( equivalent to assuming a logit model) 12 76 70 73 61 65 63

70 74 74

88

90

88

89

Tukey biweight (k = 6 ) (k = 9 )

94 67

88

80 87 83 89

H o d g esLehmann

Table 7.4 A sym ptotic efficiencies o f seven estim ators o f the E D 50, fo r seven tolerance distributions. Presented are 100 x ratio o f the optimal variance o f equation (7 .5 ) to the variance o f the estim ator

7.5

327

ROBUSTNESS A N D EFFICIENCY

cum ulative distribution function, F {x) = sin2(x + 7t/ 4 ),

for —n /4 ^ x ^ n /4

W e can see th a t trim m ing the S p e a rm a n -K a rb e r ap proach avoids the p o o r perform ance for contam inated, slash and C auchy distributions, at the cost of a certain loss in efficiency for the lighter tailed distributions. F o r the Tukey biw eight ap proach the value k = 9 dom inates the value k = 6. However, overall this is no reason to choose a H o d g es-L eh m an n or T ukey biw eight estim ator in favour of say a 5% trim m ed procedure. As m entioned already, the logistic scores estim ator is likely to dom inate the others considered here, at a cost of com p u tatio n al com plexity relative to the trim m ed S p e a rm a n -K a rb er procedure. The asym ptotic results of T able 7.4 can be m isleading as a guide for sm all-sam ple behaviour. In Table 7.5 we have abstracted sim ulation results from Jam es et a l (1984), for com parison. W e can see th at in this case the S p e a rm a n -K a rb e r estim ate perform s relatively well for the heavy tailed distributions. A dditionally, the ■

Table 7.5 (James et a l, 1984) Sample mean-square error (M S E ) fo r the Spearm an-K arber estim ator o f the E D 50, and relative efficiencies fo r the other estimators, so that the table values are in that case the inverse ratios o f the sample MSEs. M ethods compared are the 10% trimmed (S K 1 0 ) and 5% trimmed ( S K 5 ) Spearm an-K arber procedures, the H odges-Lehmann estimator (H L ) and the logistic scores estimator (L S ), all o f the E D 50. In all cases 1800 simulations were carried out, with n = 10 individuals per dose and 11 doses, the middle dose being the E D S0. Scale parameters were chosen so that 98%0 o f the tolerance distributions fell between the third smallest and third largest doses. The contaminated distributions were contaminated with a 5% mixture o f the same distributional form, but with a variance 100 times larger than the variance in the main component o f the mixture

Normal Cont. normal Logistic Cont. logistic Cauchy Slash Angular

SK

SK10

SK5

HL

LS

0.0710 0.0357 0.0624 0.0368 0.0379 0.0379 0.0863

0.841 0.939 0.871 0.954 0.972 0.972 0.783

0.918 1.045 0.938 1.057 1.074 1.074

0.777 0.918 0.791 0.923 0.936 0.936 0.756

0.883 1.094 0.851 1.069 1.075 1.075 0.978

0.888

328

N O N -P A R A M E T R IC A N D R O BU S T M ETH O D S

SK10 estim ate M SE was consistently higher th a n th a t for the S p e a rm a n -K a rb e r m easure w ithout any trim m ing (Exercise 7.16). Investigations by Finney (1950,1953) show ed how the precision of the E D 50 can vary w ith regard to the location of doses relative to the E D 50. This p o in t is em phasized by H o ek stra (1990), w ho also pointed o u t discrepancies betw een the sim ulation results of H am ilton (1979) and those of Jam es et al. (1984). H am ilton (1979) sim ulated d a ta from an experim ent w ith 10 equally spaced doses, equal num bers, n, of individuals a t each dose, for n = 5, 10, 20, an d in all com pared ten different m ethods of estim ating the E D 50. In all cases the doses were sym m etrically arran g ed a b o u t the know n value for the E D S0; the doses used have since been em ployed by S a n ath an an et a l (1987) an d K ap p en m an (1987). They are prim arily designed to mim ic routine assay experim ents w ith a wide dose range. O verall, H am ilto n ’s m ain conclusion was th a t for the sim ulation experim ents considered, m oderately trim m ed S p e a rm a n -K a rb e r estim ators were to be recom m ended. A further feature which obscures com parisons in sim ulation experim ents is th a t n o t all estim ators are calculable for all sets of sim ulated data. Extensive tho u g h these sim ulation results are, the conclusions are n o t generally applicable. F o r exam ple, sim ulation was always from a sym m etric m odel, w ith sym m etrically placed doses, and the em phasis is very m uch on experim ents w ith only a relatively small num ber of doses corresponding to interm ediate probabilities of response. T here is clearly a need for m ore research here.

7.6

Alternative distribution-free procedures

7.6.1 Sigmoidal constraint Suppose we w ant to estim ate an E D 50. O n the one h an d we m ay be prep ared to m ake strong assum ptions ab o u t the tolerance distribution, an d use logit o r p ro b it analysis, though m ore flexible approaches have been described in C h ap ter 4. O n the o th er han d we m ay only assum e th a t the tolerance d istrib u tio n exists, and estim ate the E D 50 by linear in terp o latio n from the ABERS estim ate. Interm ediate m ethods betw een these tw o extrem es have been described by G lasbey (1987). Set AP i = P i - P i. u

2 ^i^k

7.6

ALTERNATIVE D ISTR IBU TIO N -FR EE PRO CEDU RES

(di — d i - 2)(di — j

+ Tj

T7T

j

( d i - 1 ~ d i - 2) ( d i -

(di —1 —

-

2~

3 )(d ; - 2 —

329

3) D j \

J

;

d i - 3)

i— 2

- 3)

where the dose levels of an experim ent are { di91 ^ i ^ fc}. F o r the ABERS estim ate the set of co nstraints on the { P j is:

N ow , except when we m ight anticipate non-hom ogeneity, it is n atu ral to suppose th at the tolerance distrib u tio n is unim odal. This im poses a stronger set of constraints on the { P j , and underlines how little is assum ed when the ABERS estim ate is formed. The unim odality assum ption was considered by Schm oyer (1984). The cum ulative distribution function is sigm oidal, so th a t there is one point of inflexion, to the left of which the function is convex, and to the right of which it is concave. D epending on the location of the point of inflexion, (k — 1) separate sets of constraints result, one for each value of s, lying in the range: 2 ^ s ^ k . W e have (Exercise 7.22): P, ^ 0 AP2 ^ 0 A2P t ^ 0

for 3 ^

^ s

A2P t ^ 0

for s + 1 ^ i ^ k

APfc ^ 0 P k *c 1 In this case we o b tain the m axim um likelihood under each of the (k — 1) sets of constraints, and select the { P j which produce the overall m axim um . G lasbey (1987) extended this idea, providing sets of co nstraints for sym m etry, bell-shaped tolerance distributions, and

330

N O N -P A R A M E T R IC A N D R O BUST M ETH O D S

for com binations of these assum ptions (Exercise 7.23). Schm oyer (1984) found non-linear p rogram m ing m ethods sensitive to starting values, b u t G lasbey (1987) sim ply used the N A G (1984) F o rtra n sub-routine, E04V D F, to perform the optim izations. Example 7.5

( Glasbey, 1987)

R ath er th an o b tain in E D l00p value by linear in terpolation from a fitted m odel, G lasbey (1987) defined an E D 100p to be the range of doses at which the p ro p o rtio n of responding individuals can equal p, corresponding to the likelihood taking its m axim um value, subject to the p aram eters being constrained. F o r the rotenone d a ta set of Exercise 2.38, the results of T able 7.6 were obtained. Table 7.6 E D 50 values (on a natural logarithmic scale) fo r the rotenone data o f Exercise 2.38, fo r a range o f assumptions about the tolerance distribution Width o f Assumptions N one Symmetric U nim odal U nim odal and symmetric Norm al

ED 50

95% confidence limitsa confidence intervals

1.34-1.63 1.37-1.63 1.59-1.63 1.57

1.34-2.04 1.34-2.04 1.34-1.88 1.45-1.68

0.70 0.70 0.54 0.23

1.58

1.48-1.67

0.19

“Bootstrap confidence intervals, based on 399 simulations.



It is interesting to see from Exam ple 7.5 how precision depends up o n the assum ptions m ade. F ittin g extended m odels to the d ata set of this exam ple did n o t affect the estim ated precision of the E D 50. This was found to be true for all of the d a ta sets in Exercise 1.3 (M organ, 1985), b u t substantial differences can result for the E D 99, for exam ple, w hen the discrepancies of T able 7.6 can also be expected to be greater. F o r further discussion, see Exercise 7.24. 7.6.2

Density estimation

The m axim um -likelihood estim ate of the probability, P(dj), of response to dose dj where r,- individuals are observed to respond ou t of nj treated is simply, P(dj) = rj/tij

7.6

331

ALTERNATIVE DIST R IB U T IO N -FR E E P RO CEDU RES

Less simply we can define P(d) for any real d in the form: t

rA d -d,)

(v.18) £ ntS(d — dj) i= 1

where S(u) = 1 if u = 0, and d(u) = 0,otherwise. F o r m ost of the d a ta sets presented in this book there is sufficient replication th a t the values nf » l . In some cases, however, it is necessary to pool over class intervals for d. Exam ples of this are illustrated in the d a ta of T able 1.4 an d Exercise 2.23, in each case for observational studies on the age of onset of m enarche. The choice of class boundaries is a rb itrary and any resulting plot can be sensitive to the p articu lar class boundaries adopted. C opas (1983) suggested th at a b etter m ethod is to sm ooth using a kernel function i{/(u), such as \l/(u) = exp( —u2/2) to give, Z

^

{ ( d - d j / h

}

P(d) = ~ ---

(7.19)

Z n^id-dd/h} i= 1

where h > 0 is a sm oothing param eter. The original intention was for equation (7.19) to be applied when m ost of the {rcj were unity, to enable a plot to be m ade of P(d) versus d, possibly with a view to assisting in model-selection. It was suggested th a t the plot should be m ade for a range of values of h. C opas (1983) also presents an estim ate of \ ( P ( d) ) and a m ethod of bias correction. F o r a logistic regression application, see K ay and Little (1986). This ap p ro ach m ay also be applied when nt. » 1, for all i, and it has been developed for this application by K appenm an (1987), who presents a cross-validation procedure for estim ating h. Based on the norm al distribution kernel given above, the m ethod reduces to solving the equ atio n in h:

Z

rj lPjlib ) - P ji

A

+

Z B

(n j -

(h)2-

rj)U

-

{! - PjiW}2] P jlih ) ~

p j2 (h )2

- {1 -

p j2 ( h ) } 2]

=

o

(7 .2 0 )

332

N O N -P A R A M E T R IC A N D R O BUST M ETH O D S

where the first sum is over the set: A = { j : r j > 0}, the second is over the set: B = { j : r j < rc^}, and P n(h) = {rj -l+ sjth )}/tjth )

and pj 2(h) = {rJ + sj (h)}/tj (h) where Sj(h)=

X

r ie- ^ - d^ 2h2

i*j tj(h) = n j- l + Y , e ~ (dj- di)2/2h2.

i*j U nfortunately eq u atio n (7.20) som etim es has m ore th a n one solution. An add itio n al aw kw ard feature is th a t the resulting estim ate of F(x) is n o t always m ono tonic increasing. W ays aro u n d these obstacles are provided by K ap p en m an (1987), w ho dem on­ strates the good properties of the resulting m ethod in a sim ulation study. The m ethod perform s well, in term s of m ean squared erro r of the E D 50, in com parison w ith the 10% trim m ed S p e a rm a n -K a rb er m ethod, w ith the relative perform ance of the kernel estim ator im proving as the sam ple size increases. H ow ever, the com m ents of H o ek stra (1990) on the specific design of the sim ulation experim ent are germ ane here also. A further ap plication of the kernel ap p ro ach is found in Stanisw ails an d C o o p er (1988). 7.7

Discussion

C o m p u ter pow er has opened up a rich range of new non-param etric procedures for the analysis of q u an tal assay data. W hile some of these m ethods are restricted to E D 50 estim ation, others, such as those of section 7.6, are m ore widely applicable. As G lasbey (1987) points out, the m ethods of his p ap er have application also to estim ating relative potency, an d also in experim ental design. S chm oyer’s (1984) sigm oidal co n strain t ap p ro ach was originally presented as potentially useful for low-dose extrapolation (section 4.6). It is relevant here to n ote the discussion in G aylor and K odell (1980) on the role of the sigm oid assum ption in low -dose extrapolation. F o r further related w ork, see Schm oyer (1986) and Schell and

7.8

333

EXERCISES A N D C O M PLEM EN TS

Leysieffer (1989). An alternative m ethod for fitting the stan d ard logit m odel to q u an tal assay d a ta has been provided by C obb and C hurch (1983). Based on S p e a rm a n -K a rb e r type estim ators, the m ethod is show n to possess good small sam ple properties, subject to the equations for the param eter estim ates having a solution. C ertain of the com parisons th a t have been m ade of the new m ethods m im ic those m ade m uch earlier of old m ethods - see for exam ple Finney (1950, 1953). F o r E D 50 estim ation, trim m ing the S p e a rm a n -K a rb e r estim ate by ab o u t 5% seems to result in a sub­ stantial im provem ent, and a good general-purpose ro bust estim ator. It is useful th a t suitable software is available on a floppy disk for evaluating the trim m ed S p e a rm a n -K a rb e r estim ate (Appendix E). F o r p articu lar applications, m ethods such as those of sections 7.5.2 and 7.6, and the trim m ed logit m ethod will undoubtedly be of value, though their m ore general use m ay be ham pered by com putational com plexity and the lack of off-the-peg com puter software. However, the situation is b o u n d to change in this regard. The advantages of a p aram etric app ro ach lie in the flexibility afforded, for instance, th ro u g h the ready extension to wider m odels (as in C hapter 4), through the extension to the case of tim e-to-response d a ta (as in C h ap ter 5), th ro u g h the in co rp o ratio n of n a tu ra l m ortality (as in C h ap ter 3), and th ro u g h the addition of over-dispersion (as in C h ap ter 6). P articularly valuable, however, is the added perception, th ro u g h the w ork repo rted in this chapter, of how nonrob u st p aram etric estim ators m ay be, and, as show n in Table 7.6, the extent to which strong p aram etric assum ptions can heighten estim ates of precision.

7.8

Exercises and com plem ents

The exercises vary in com plexity. It is particularly desirable to attem p t the five exercises m arked w ith a f. Exercises m arked with a * are generally m ore difficult o r speculative. 7.1 * Show th at, when different orders of adjustm ent are possible to provide an ABERS estim ate, they result in the same ABERS estimate. 7.2 Verify section 7.2.

the

m athem atical

expressions

for

{Pf} given

in

334

N O N -P A R A M E T R IC A N D RO BUST M ETH O DS

7.3 Provide a graphical illustration of the m echanics of the S p e a rm a n -K a rb er estim ate. D erive a form ula for the variance of the tolerance distribution, using the sam e S p e a rm a n -K a rb er approach, assum ing a con stan t dose separation. 7.4f (Church and Cobb, 1973). In equal weight designs, (dj+ x — dj_ J / rij is constant, for 1 ^ k . Show th a t und er equal weight designs, if P l = 0 and Pk = 1, the S p earm an -K arb er estim ate is unaffected by the ABERS averaging procedure. 7.5 The M oving Average m ethod is due to Thom pson (1947), and proceeds as follows. Decide on a suitable interval (span) for taking m oving averages, calculate these for the { P j and the { d j, and then use linear interpolation between the averages dj and d j +1 for which the P *s enclose 0.50. T hus we set: fi = d j + a(dj+x — d j \ where a = (0.5 — PJ)/(PJ+1 — PJ), and where, e.g., for a moving average of span s, d f = (dj + dj+1 H h dj+s_ 1)/s. H ow w ould you estim ate Var(0)? E xperim ent w ith this m ethod for a range of different spans and different d a ta sets. Im provem ents have been suggested by B ennett (1952, 1963). The m ethod is po p u lar am ongst ecotoxicologists (Stephan, 1977). It was outperform ed by other m ethods in the sim ulation study of E ngem an et al. (1986), b u t is useful for small o r aw kw ard d a ta sets (H oekstra, 1990). H ow does the m ethod operate if (i) the span = 1, or (ii) the span increases? 7.6 (Miller, 1973). C onsider q u an tal assay d ata w ith fixed spacing of doses, A, and n individuals per dose. If there is a value of s for which

Z

rt =

1=1

Z («-

r i)

i =s

then the R eed -M u en ch estim ator of the E D 50 is defined as Or = dfc

A(/c s).

Show th a t it is possible to write k

§R = dk +

A(1 - P J - A

z Pi

i =1

and discuss the sim ilarity of this expression w ith the corresponding one for the S p e a rm a n -K a rb e r estim ate. H ow w ould you proceed if there is no such value of s?

7.8

EXERCISES A N D C O M PLEM EN TS

335

7.7 W e use the same n o ta tio n as for Exercise 7.6. C om m ent on the behaviour of the sequence

s

k

E ri + Z (n - ri)

i= 1

i =s

as s increases. If there is a value of s for which P s = 0.5, then the D rag sted t-B eh ren s estim ator of the E D 50 is defined as, 0 d = dt — A(k — s) Show th at if there is a value of 5 for which P s = 0.5, then the R eed -M u en ch and D rag sted t-B eh ren s estim ators coincide. H ow would you proceed if there is no such si 7.8 Egger (1979) found m ultiple roots to be a problem when solving the equation for the B o x -C o x param eter k resulting from setting /2(3) = 0, as in section 7.3. F o r d a ta sets 2 and 3 of Exercise 1.3, single roo ts resulted, giving, respectively k = —0.97, and k = — 0.09. In light of these results, discuss which of the m any alternative models discussed in C h ap ter 4 you w ould expect to provide a good fit to the data. 7.9f Experim ent with the trim m ed S p e a rm a n -K a rb e r m ethod, applying it to a range of the d a ta sets of Exercise 1.3, using different levels of trim m ing. 7.10

Apply the trim m ed-logit procedure to the d a ta of Table 1.4.

7.11 Suppose the p robability of response to dose d is given by P(d) = F ( d — ji ), for cum ulative distribution function F(x), of zero mean. Show th a t the asym ptotic efficiency of the S p e a rm a n -K a rb er estim ator of the E D 50 is:

N O N -P A R A M E T R IC A N D R O BUST M ETH O D S

336

Show th a t e = 1 if and only if F(x) is logistic. 7.12 (C ontinuation.) Show th a t e = 0 if F(x) is exponential, with F(x) = 1 — exp( — x/n), x ^ O . 7.13^ 7.14

(C ontinuation.) Show th a t e = 0.9814 if F(x) is norm al. Verify the expression of eq u atio n (7.9).

7.15* P rove th a t the variance expression of eq uation (7.14) is m inim ized when S(u) = log {u/( 1 — u)}. 7.16 Discuss the following results, taken from Jam es et al. (1984), based on 1800 sim ulations for each distribution, an d m ake com parisons w ith Tables 7.4 an d 7.5. There were 11 dose levels, and n = 20 individuals per dose. T he m iddle dose is the E D 50. The scale p aram eters were chosen so th a t 98% of the tolerance distributions fell betw een the third sm allest an d th ird largest doses.

N o rm al C ont. norm al Logistic C ont. logistic C auchy Slash A ngular

Sample efficiencies, relative to Spearman-Karber

Me an square error for Spearman-Karber

SK10

SK5

HL

LS

0.0348 0.0181 0.0326 0.0188 0.0192 0.0192 0.0430

0.835 0.955 0.886 0.978 0.994 0.993 0.794

0.923 1.118 0.951 1.126 1.144 1.143 0.896

0.758 0.981 0.796 0.985 0.996 0.995 0.755

0.882 1.359 0.889 1.295 1.302 1.299 0.981

7.17 H am ilton (1979) carried o u t a sim ulation study of 10 estim ators of the E D S0. H is ap p ro ach has been replicated in a num ber of subsequent studies. Five different m odels resulted in the probabilities of response show n below. C om m ent on the likely im plications of this selection of m odels. M odel I is logit. C om pare the probabilities of response w ith those resulting from a logit m odel with E D 50 = 5.5 and /? = 1. This is one m odel used by H o ekstra

7.8

EXERCISES A N D CO M PL E M E N T S

337

(1990) in a sim ulation study of the com parative perform ance of the m ethod of Exercise 7.5. Cf. also E ngem an et al. (1986). Values o f the Probability o f a Response at Dose d, fo r each o f five models Model Dose ( d ) 1 2 3 4 5 6 7 8 9 10

I

II

III

IV

V

0.00026 0.00160 0.01000 0.05969 0.28516 0.71484 0.94031 0.99000 0.99840 0.99974

0.00157 0.00389 0.01000 0.03415 0.21945 0.78055 0.96585 0.99000 0.99611 0.99843

0.00187 0.00440 0.01000 0.02232 0.12904 0.87096 0.97768 0.99000 0.99560 0.99813

0.00188 0.00441 0.01000 0.02111 0.05016 0.94984 0.97889 0.99000 0.99559 0.99812

0.00020 0.10128 0.10800 0.14775 0.32813 0.67187 0.85225 0.89200 0.89872 0.99980

7.18 E valuate the influence curve of equation (7.15) and the sensitivity curve of equatio n (7.16) for the sam ple m ean of a direct ran d o m sam ple of size n. C om m ent on the forms th a t result. 7.19 Show th a t for the S p e a rm a n -K a rb e r estim ator the influence curve is: I C T'F( d , l ) = - { l - F ( d ) } ; I C TF(d,0) = F(d) Show fu rther th a t the a-trim m ed S p e a rm a n -K a rb e r estim ator has the influence curve: I C TF(d,y) = 0 if d < F ~ 1(a), o r d > F _1(l — a), _{F(d)-y} (1 - 2a) {F(d)-y} 2 (1 - 2 a )

it d = F A(a), or d = F

D raw conclusions from these findings.

(1 — a)

338

N O N -P A R A M E T R IC A N D RO BUST M ETH O DS

7.20 U nder suitable regularity conditions docum ented in Jam es and Jam es (1983), the influence curve for M -estim ators is given by:

— 0)dF(x) Verify th a t the Tukey biw eight estim ato r is influence-curve robust. 7.21 Let 0Ly(d) denote the m axim um likelihood estim ator of the E D 50 when one additional response of value y is observed at dose d under the logit model. Let the m axim um likelihood estim ator of the E D 50, under this m odel, w ithout any add itional response be 0L. Discuss the influence curve ad o p ted by Jam es and Jam es (1983): I C L F{d, y) = lim lim ^ ( 0 Ly( d) - §L), oo «->oo A

>’ = 0,1

These au th o rs show further th a t I C L'F( d , l ) =

I

g( S, we set, c• di + l = d i - ~ ( r i - p ) I where ct = m ax [ 2% of sim ulations, and usually far less frequent th an that). The general conclusion is th a t the three-point sequential design perform s best in term s of global description of the dose-response curve, and is at least com parable in perorm ance with other m ethods regarding E D 50 estim ation. It was found also th at a stan d ard R o b b in s-M o n ro procedure, with large step-size param eter c, can perform well if followed by m axim um -likelihood estim ation based on responses to all of the dose levels selected. It was suggested th at increased perform ance m ight result from using a robust procedure, rath er th an one based simply on the logit m odel, possibly along the lines suggested by S an ath an an et a l (1987), described in C h ap ter 7. 8.7

Discussion

W ork on sequential q u an tal analysis was initiated by A nderson et al. (1946). Since then research in the area has been extensive and continuous. A full discussion of design and sequential m ethods for q u an tal assay d a ta would require far m ore space th an we have devoted to the subject here. W e have focused on designs for the estim ation of single dose-response curves, or sum m aries of these. Design for com parisons is discussed in Brow n (1966); design for m ixtures is considered by A bdelbasit and P lackett (1982); design for the division of fixed resources between control and experim ental anim als is discussed in T hall and Sim on (1990), with reference also to the weight to be placed on historical controls (Exercise 8.26). Response surface m ethodology is used by C arter et a l (1985) for describing responses to a m ixture of three substances, and also for estim ating toxic effects (Exercise 8.27). A fundam ental assum ption of sequential m ethods of the R o b b in sM o n ro kind is th a t it is possible to produce doses of the required strengths w ithout difficulty. The investigations of K alish (1990) and of R osenberger and K alish (1981) into gauging loss of efficiency resulting from sub-optim al assignm ents of doses is particularly valuable in the context of possible dose m easuring errors (cf. section 3.9). The p o o r perform ance of asym ptotic criteria in small sam ples has been observed by K alish (1990), and m atches corresponding obser­ vations in C h ap ter 7. Sim ulation studies are m ost valuable for sm all-sam ple evaluations, but because of the m any factors to be considered, it can be difficult to distil the im p o rtan t features from

D E S IG N A N D S E Q U E N T IA L M ETH O DS

362

the results. F o r b o th the up-and-dow n an d stochastic approxim ation sequential m ethods, the basic approaches have been found to be im proved by the use of initial delays. Im provem ents m ay also result from the ad o p tio n of a simple p aram etric m odel, to be fitted to the d a ta by m axim um -likelihood, as the d a ta becom e available. There are difficult problem s of inference from sequentially constructed designs, as it is difficult analytically to account for the sequential construction. This is discussed in Silvey (1980, section 7.4), and investigated in a p articu lar sim ulation study by F o rd and Silvey (1980). T here is scope for m ore extensive investigations of this kind. 8.8

Exercises and com plem ents

The exercises vary in com plexity. It is p articularly desirable to attem p t the five exercises m arked w ith a f . Exercises m arked with a * are generally m ore difficult o r speculative. 8.1f N o te th a t Exercises 8.1-8.4 all derive from the p aper by R osenberger and K alish (1981). In all cases we have a to tal of m individuals to allocate equally betw een tw o dose levels w ith respec­ tive probabilities of response P { d ^ P(d2) w here P i d J = 1 — P(d2). The assum ed m odel is the logit m odel w ith P(d) = (1 + e ~ i )hh 0 ^ n 1 < n 2 ^ n are all integers, observations being taken in groups of size n, and rt potatoes split after n were dro p p ed from height ht. C om pare this rule w ith th a t of equatio n (8.8), and discuss the choice of n, n 1 and n2 for estim ating the E D 10. 8.12 (C ontinuation.) F o r the 10 species of p o tato in Exam ple 8.1, the m ean weights (in grams) are as follows: 98, 107, 167, 154, 150, 111, 137, 118, 113, 204. C onsider how you m ight m ake use of this additional inform ation. Jansen and B ow m an (1988) investigated w hether it was w orth including size of bruising in a p o tato dro p experim ent. 8.13 f The following delayed R o b b in s-M o n ro process was suggested by C o chran (Davis, 1971): replace equation (8.9) by di +1 = d i - c ( r i - p ) until b o th responses and non-responses have been observed. Let t* denote the n um ber of the first trial for which this occurs. T hen revert to the sequence:

W hy m ight this be a sensible m odification? Will this m odification affect asym ptotic properties?

366

D E S IG N A N D SEQ U E N T IA L M ETH O D S

8.14 Discuss the m odification of the R o b b in s-M o n ro process given below, an d suggested by K esten (1958): Replace c/i in equation (8.9) by c/ij/(i), where * ( 1 ) = 1 , *A(2) = 2, and { m = W ~ l)if{di ~ U ( i ) = ij/(i - 1) + 1, if (d, -

1 “ 4 - 2 ) > °> i)(4 _ i - d,_ 2) < 0, for i > 2.

8.15^ Verify th a t the first six log-dose levels in Exam ple 8.2 are, in order, 0, - 2 .6 , - 2 .7 5 , 2.6, - 2 .3 5 , - 2 .4 5 . 8.16* (M cLeish and Tosh, 1990). Suppose th a t D denotes the extra cost of a response, com pared w ith non-response. Show th a t a to tal cost con strain t for the expected cost can produce the constraint: t

ni[ l + D F { p ( d i - e m

=C

i= 1

Discuss how this in tro d u ctio n of a differential cost affects the operatio n of the sequence given by equatio n (8.13). 8.17* C onsider how to apply the ap p ro ach of M cLeish and Tosh (1990) to the case of observations being tak en over time, as discussed in C h ap ter 5. 8.18* C onsider how to apply the M cLeish and T osh (1990) ap p ro ach when n atu ral m ortality is present, and described by A b b o tt’s form ula (section 3.2). 8.19* (Kalish, 1990). F o r the sequential three-point design discussed in section 8.6, consider how you w ould estim ate F(x) at any stage if either the m axim um -likelihood estim ate did n o t exist or if the slope estim ate was negative. 8.20* Discuss factors which w ould need to be considered when start-u p designs are devised for sim ulation experim ents to com pare alternative sequential m ethods. 8.21* C onsider w hether it m ight be feasible/advantageous to use the m inim um logit chi-square rath er th a n the m ethod of m axim um likelihood in W u’s ap p ro ach of section 8.4.2.

8.8

367

EXERCISES A N D C O M PLEM EN TS

8.22 C om pare and co n trast the sequential m ethods of K alish (1990) and of M cLeish and T osh (1990). 8.23 Suppose an up-and-dow n experim ent starts from dose level d0 and th a t the dose levels for the experim ent can be w ritten as: di = do + zA The progress thro u g h the dose levels can be m odelled by m eans of a first-order M arkov chain, with transition probabilities of F(df), of m oving from dt to d{- x and 1 —F(dt) of m oving from d{ to di+1, where F( ) denotes the tolerance distribution. Show th a t the M arkov chain has an equilibrium distribution {7q}, satisfying n i{ l - F ( d i)} = n i+1F(di+i) 8.24 (C ontinuation.) In order to o b tain the asym ptotic distribution of the m easure E w, given in section 8.3, it is first of all necessary to obtain the equilibrium distrib u tio n of a first-order M arkov chain specified below. The required asym ptotic distribution then follows from the application of theorem s relating to functions of a M arkov chain (K ershaw , 1985a; Chung, 1966). T here are four states, defined as follows:

(41 ) (4 2 ) (4 3 ) (44 )

observation of the sequence:

dh d* _ x, d *

observation of the sequence: observation of the sequence:

df_ 2, i, d{ db di+1, d t

observation of the sequence:

di +2, d i+1, dt

Verify th a t this defines a four-state M ark o v chain, with equilibrium probabilities given by: n n = F(di) { i - F ( d i_ 1) }ni n i2 = { l - F ( d i_ 2) } { l - F ( d i_ l) }ni _ 2 n i3 = { l - F ( d i)}F(di+l)ni n i4 = F(di+2) F( di+1)ni +2 8.25* Provide a M arkov chain form ulation of the alternation of heights in the experim ent of Exercise 8.11. 8.26* (Thall an d Simon, 1990.) C onsider an experim ent involving a single treatm ent, and in which the problem is how to allocate a

D E S IG N A N D SE Q U E N T IA L M ETH O D S

368

fixed num ber of individuals betw een treated and control groups. O b tain an expression for the variance of the treatm ent effect and hence derive an eq uation to be solved to provide the optim um allocation of individuals to treatm ent. Extend this ap p roach to the case where historical control groups are also available and a decision has to be m ade concerning the relative weight to be placed on the historical inform ation. (Cf. section 6.7.4.) 8.27 Exposed to the nerve agent soman, guinea pigs m ay die. C arter et a l (1985) exposed guinea pigs to a range of doses of soman (Y 3), adm inistered sub-cutaneously. O ne m inute following dosing, anim als were treated w ith various treatm ents, form ed from mixing atropine ( X 2) and pralidioxime chloride (A^). The probability of anim als surviving, p, was described by the logistic model: P 1 = 1 + exp{ —(jj0 + l^l X 1 + /?2AT2 + ^ 3^3 +

+ p 12x 2x 2 +

p 13x t x

3+

p 23x

2x 3 +

p ^ X ^ X J }

After using m axim um -likelihood, and fitting the m odel by the simplex m ethod of N elder and M ead (1965), the following param eter estim ates were obtained: Parameter Po Pi Pi Ps Pll P22 P 12 Pi 3 P23 P123

m.l.e

Estimated asymptotic standard error

3.0486 0.0179 0.0773 - 0 .1 0 9 5 - 0.0003 -0 .0 0 0 1 -0 .0 0 0 4 0.0006 - 0.0005 0.000006

0.6084 0.0097 0.0114 0.0137 0.000032 0.000020 0.000122 0.000170 0.000180 0.0000021

P rovide a full discussion of these results, w ith reference to the toxic, as well as the beneficial, effects of the tw o treatm ents. 8.28 (R obertson and M organ, 1990). The signal-detection experi­ m ent has been described in Exam ple 1.8 and Exercise 2.8. In the simple case of ju st tw o responses, Yes or N o, a stan d ard m odel is

8.8

369

EXERCISES A N D C O M PLEM EN TS

based on the probabilities: P r(R espond Yes | signal presented) = 1 — 0>(c — d); Pr(R espond Yes | noise presented) = 1 — (c). The results of the experim ent m ay be described by n o tatio n analogous to th at of Exercise 2.8. Show th a t the variance of the signal/noise distribution separation param eter d is given, for the norm al m odel, by:

rv{vith first-order p artial derivatives, dg/d£i9 which are continuous and n o t all zero at £, then g(C) is an estim ator which is asym ptotically norm al, N(g(Q, o 2\ where

and

evaluated in practice at £ Resulting confidence intervals are invariant under continuously differentiable 1-1 transform ations of the param eters, since b o th g ( Q an d o 2 are invariant under such transform ations. Simple form s are given in Lindley (1965, pp. 134,135), w ho extends the T aylor series expansion for E [g (0 ]. Extensions, proofs and exam ples are provided by B ishop et a l (1975, section 14.6). See also C ox (1984a) for further applications. G L IM m acros are provided by B urn an d T hom pson (1981) and V anderhoeft (1985).

APPENDIX A

371

One-step procedures Suppose an estim ator ( is based on n observations and is form ed as the result of an iterative procedure. This procedure m ay require several iterations before the term ination criterion for the iteration is satisfied. In a jack-knife context we need to form f (i), resulting from om itting the ith observation, 1 ^ i ^ n. This too, of course, requires iteration. C ham bers (1973) suggested th a t f m ay be used as the starting value for the iteration for £(i) and th at only a single step of the iterative procedure for £(i) m ight suffice, thereby greatly simplifying the num erical analysis. This ap p ro ach has been taken up by P regibon (1980, 1981), as described in sections 3.4 and 4.5.2, and was used also by K leinm an (1973) (Exercise 6.15). D etailed study, b oth asym ptotic and by sim ulation, of the perform ance of one-step procedures for M -estim ation (section 7.5.2) is provided in Bickel (1975). P regibon (1982b) shows how score tests in G L IM can be accom plished by fitting a reduced model, followed by one-step in the fitting of the corresponding full model, starting from the m axim um -likelihood estim ate of the param eters in the reduced model. (Appendix D.) O ne-step principal com ponent estim ators are developed by M arx and Sm ith (1990) for use in generalized linear regression. A detailed analysis of one-step m ethods has been given by Jorgensen (1990), who also generalizes P regibon’s (1981) diagnostic m easures.

APPENDIX B

GLMs and GLIM In logistic regression, we write the probability of response of the ith individual, corresponding to a covariate vector x h as: P(xd = ( l + e - * ' ilt) - 1 as in the expression of eq u atio n (2.14). This is an exam ple of a generalized linear m odel (GLM ). The classical reference is M cC ullagh and N elder (1989). An in tro d u ctio n to the subject is provided by D ob so n (1990). Like all G L M s, the above exam ple has three basic com ponents: 1. a response distribution of a ran d o m variable, 7 , say, of m ean fi. T he distribution is B ernoulli here (the random com ponent); 2. a linear function of the covariates, rji = f i' xt (the systematic com ponent); 3. a link function, linking the m ean response ^ of the ith individual to rjt: rji = h(fii), where h is any m ono tonic differentiable function. In the logistic regression exam ple, = P(jcf). It m ay be sim pler to think in term s of the inverse of h: Hi = g(%) and this is the n atu ral ap p ro ach for discussion of composite link functions. The im p o rtan t p ap er by N elder an d W edderburn (1972) show ed th a t a wide variety of statistical procedures could be regarded as G L M s, and fitted to d a ta by m eans of the sam e iterative algorithm . E qually im p o rtan t was the first release, tw o years later, of G L IM , a com puter package for fitting G L M s. G L IM stands for generalized linear interactive modelling. The success of this package derives from its wide applicability and interactive nature. A dditionally it provides data-h an d lin g facilities and the fitting of user-defined models. This was originally th ro u g h the O W N directive an d its associated m acros. H ow ever, this directive has been abolished in G L IM 4, where the required m acros are provided as further p aram eters of the SL IN K

A P P E N D IX B

373

and SER R O R directives (see below). In certain exam ples in the book the O W N directive is still in use. Sim ilar features are now available in G E N ST A T , and G L IM and G E N S T A T N ew sletters regularly publish m acros for p articu lar techniques. G L IM provides in­ gredients, rath er th an recipes. As a result it has proved rem arkably flexible and of use in a wide range of areas. This is som etim es called G L IM N A S T IC S - for an illustration, see A itkin and C layton (1980), discussed in A ppendix C. The basic p robability distribution (or probability function) for the ran d o m com ponent of G L M s is of the form (A l) for some functions a, b and c, and param eters £ and . E [ Y] = p = b'(£) and V( Y) = a( )£/'(£). Because of the way it enters the expression for V( Y), is called the ‘scale’, or ‘dispersion’, param eter. F o r the norm al distribution, (j) = a 2, the norm al variance, while for the Poisson and binom ial distributions, 4>= 1. O ver-dispersion for the binom ial case m ay be simply accom m odated by estim ating (j) - see the discussion of section 6.4, which, however, relates to the separate approach of quasi-likelihood. A sim ilar procedure in the Poisson case can be used to approxim ate to a negative-binom ial m ean-variance relationship (Payne, 1986, p. 112). However, an exact procedure is given in section 6.5.2. F o r discussion, see A itkin et a l (1989, pp. 214, 224). W hen is assum ed know n, f Y(y) belongs to the exponential family. Key examples are norm al, Poisson, binom ial, gam m a and inverse G aussian distributions. The three m ain link functions, a p a rt from the identity, are: logit: rj = \og{p/( l - g)}, probit: rj = (Q + b(Q}

(A2)

0 i=1

where wf = 0 / a f(0). This is the scaled deviance. W ith o u t the divisor 0 it is called the deviance. If the null-hypothesis is ap p ropriate, the scaled deviance has, asym ptotically, an ap p ro p riate chi-square distribution. In sm all exam ples the corresponding approxim ation for the difference of tw o scaled deviances, for com paring nested m odels, has been found to be m ore reliable th an the ap proxim ation for the scaled deviance, used as a m easure of fit of a single m odel (Payne, 1986, pp. 107, 111; note also M cC ullagh and N elder, 1989, p. 36).

375

A P P E N D IX B

In G L IM the iterative procedure m odel-fitting results from setting

for m axim um -likelihood

j» = (X /W X )” 1X'W z

(A3)

F o r a detailed derivation, see, for example, D obson (1990, p. 41). H ere X is the design m atrix; W is a diagonal m atrix of weights: W = H V “ 1H, where V is a diagonal m atrix with elem ents {vf/}, where vH is the variance of the ith observation; H is a diagonal m atrix of elements, hu = duJdrji, z is the w orking vector: z = 17+ H ~ 1 (y — p). The m ethod is iterative since r\ — Xp. T hus we see th at equation (A3) generalizes the expressions of equations (2.2) and (2.19). To start the iteration it suffices to estim ate /i by the observation vector .p. F o r the logit m odel, for example, this corresponds to using m inim um logit chi-square estim ates as starting values. W hen the O W N m odel facility is used in G L IM , then four m acros have to be specified, providing the vectors, % FV, as /*. = ft(jf.), % DR, as l / h H, %VA as vif, from the values of the linear p redictor % LP, which is rjh and % D I, the individual deviance term s (used for deciding on term ination). An illustration is provided in Exercise 4.7. Several of the restrictions in the definition of G L M s can be relaxed, as discussed by G reen (1984), Stirling (1984) and Cox (1984b). See also the com m ents of N elder (1990). The extension to composite links is provided by T hom pson and Baker (1981). Here the m ay be m odelled by any function of the param eters, n o t ju st a linear function. In this case the same iteration of equation (A3) applies, but with H as a unit m atrix, and X a m atrix where x u = d^JdPp which needs to be u pdated at each iteration. H ow to do this in G L IM is explained by Roger (1985). This is useful for exam ple in m odelling grouped data, as in section 3.5 in m odels incorporating natu ral response/im m unity, as in section 3.2, and m ore generally in fitting m odels such as the A ran d a-O rd az asym m etric m odel of section 4.2. The ap p ro ach of Exercise 4.1 will in general underestim ate estim ates of variances (see A ppendix C). E kholm and Palm gren (1989) present a direct and m ore general approach. The ad a p ta tio n of the iteration of eq uation (A3) to the case of quasi-likelihood has been described, for the binom ial and Poisson cases, in section 6.4. N yquist (1991) presents the theory for w hen the param eters in a generalized linear m odel have linear restrictions. A particular application is to q u an tal response com parisons w hen there is co n stan t relative potency betw een substances tested.

A P P E N D IX C

Bordering Hessians A m odel m ay have param eters: (ol, P ,) = (ol9P 19P29.. F o r fixed a it m ay be th a t likelihood m axim ization w ith respect to the elem ents of P is easily carried out, e.g. by using G L IM . The overall m axim um -likelihood estim ate, (a,/T) m ay then be obtained by m axim izing the likelihood w ith respect to P for a range of values of a, an d then plotting the resulting m axim a versus a, to ob tain a profile likelihood (cf. Figure 2.2). The m axim um of this profile likelihood then provides the required (a,/T). H ow ever, erro r estim ates for the elem ents of p will n o t reflect the variatio n in a, and it is necessary to account for this. The required adjustm ent is easily accom plished as follows. Let / denote the log-likelihood, and set

Let A denote the m atrix {ajk}9 an d b denote the colum n vector {bj}. Assum ing th a t A is non-singular, the asym ptotic estim ate of the dispersion m atrix of the p aram eter estim ators is then given bv:

Thus, V 22 = l / ( c - b /A " 1b) V n = A -1 + A _1bb'A_1V 22 V 12= - A - 1bV22

A P P E N D IX C

377

An illustration of this technique in action is given in A itkin and C layton (1980). They show how G L IM m ay be used to analyse survival data, using a Poisson d istribution and a log link. F o r the W eibull m odel the fit has to be iterated on the W eibull shape param eter a and the above adjustm ent is then needed to correct the estim ate of V u produced by G L IM , which is ju st A - 1 , as well as to produce V 22 and the com ponents of V 12. It is n o t unusual to find th at A -1 is incorrectly used for V n - see for exam ple Exercise 4.6 and the com m ents in M organ (1983) and Roger (1985). The erro r involved in the application of Exercise 4.6 is investigated in T aylor (1988).

A P P E N D IX D

Asymptotically equivalent tests of hypotheses

Suppose inference concerns a p aram eter £ w ith p elements, a null hypothesis specifies, H 0: f = f 0, an d we write the log-likelihood for a set of d a ta as /(f), m axim ized at f The likelihood ratio test rejects H 0 a t the 100oc% level if 2 { K Q - K S o ) } > X 2p,x where x 2p a denotes the a% critical value for a ran d o m variable with a ^ distribution. Exam ples are found th ro u g h o u t the text - see, for exam ple, section 5.5.1. We denote the vector of efficient scores by U(Q, which has as its jth element,

A sym ptotically, f has the m ultivariate n o rm al distribution given by N(Q I -1 (£)), where 1(f) denotes the Fisher expected inform ation m atrix, w ith (/, k)th elem ent,

A sym ptotically, U(Q also has a m ultivariate norm al distribution, b u t w ith dispersion m atrix I(£). T he score test (som etim es called the L agrange M ultiplier test) rejects H 0 if U ( U I - 1 (& )*/(& )>

379

A P P E N D IX D

The im p o rtan t point to note here is th at the test statistic does n o t require f The W ald test rejects H 0 if o )(? -& )> * £ , An alternative version of the test replaces I (f0) by /£). Exam ples are to be found in the solutions to Exercises 3.4, 3.5 and 3.6. A sym ptotically all three tests are equivalent. F o r certain hypotheses in logistic regression, the W ald test can behave in an ab erran t m anner, as show n by H auck and D onner (1977). If the param eter vector is (£ 0)' and H 0 is unchanged, then we have nuisance param eters (j). W e shall provide the required analysis for score tests. W hen f = f 0, let the m axim um likelihood estim ate of be 0 O. Let us w rite I, partitio n ed according to the p artitio n of the p aram eter vector, as ^12 ^22_

j _

and let t/(£0, ^o) denote the vector of scores for f alone, evaluated when f = £o an ^ 0 = 0 o- The score test statistic is now: W o ,0 o ) '( l n where I is calculated at (f0, 0 O)'. In particular, I 22 is simply the Fisher expected inform ation m atrix for the param eters 0 O, when f = f 0. Reference is again to y? . T hus in this case the only optim ization is done under H 0. A graphical illustration of the difference between these three tests is given by Buse (1982). As an illustration, consider a score test of the logit m odel versus the A ran d a-O rd az asym m etric m odel of section 4.2 (this is Exercise 4.3). In this case, 0 = (a, /?)', £ = X, and £0 = 1. In order to perform the score test we ju st have to fit the logit model, corresponding to C0 = 1. Suppose the fitted p robability of response to dose x t is d enoted by pb then the additional term s needed for the evaluation of the score test statistic are:

In = L f=i

1 Pi

{Pi + log(l

-Pi)}2

380

ASYM PTOTIC ALLY EQ U IV A L E N T TESTS O F HYPOTHESES

{Pi + logU - P f ) } Exam ples of the result of this test, an d com parisons w ith the likelihood-ratio, and goodness-of-link (section 4.5.2) tests are given in Table 4.7. As we test an hypothesis regarding the scalar param eter A, we ju st require dl/dL This is a com m on example - for other illustra­ tions, see Exercises 2.39, 3.38 and 6.10. A sym ptotic properties of score tests are reviewed by T aro n e (1985). A detailed discussion is provided by Cox an d H inkley (1974, section 9.3). A particularly interesting article by Pregibon (1982b) shows how score tests m ay be perform ed in G L IM , am ounting to a difference between chi-square goodness-of-fit statistics, one for the reduced m odel, and the o th er for the result of a one-step iteration w hen fitting the full m odel, startin g from the m axim um -likelihood param eter estim ates from fitting the reduced m odel. This avoids the additio n al algebra, an illustration of w hich is given above, and does so by m oving tow ards fitting the full m odel. F o r applications in over-dispersion, see Breslow (1989). F o r further sim plifications w hen likelihoods are based on exponential families of distributions, see G a rt and T arone (1983).

A P P E N D IX E

Computing

This appendix brings together various references to com puting m ade th ro u g h o u t the text. It also describes relevant facilities in a num ber of com puter packages. In this latter case we have concentrated on the m ain com puter m anuals, and m ake no reference to versions for personal com puters. These can differ from the stan d ard packages e.g. S PS S/P C + does n o t contain the procedure P R O B IT (see below). It should also be realized th a t com puter packages regularly undergo changes. As we shall see, it is now fairly stan d ard for packages to give the optio n of using either a logit, p ro b it or com plem entary lo g -lo g link, the optio n of fitting a m odel with n atu ral m ortality (and sometimes also imm unity), and the possibility of testing for parallelism when results are available for m ore th an one tested substance. The book by Afifi and C lark (1990) is m ost useful in discussing facilities available in different com puter packages. O f particu lar relevance is C h apter 12 on logistic regression. Function optimization A library of F O R T R A N routines such as N A G (see below) contains a wide range of program s for function optim ization. In this text we have referenced the following N A G routines: C h apter 2, E 04C C F (N elder M ead simplex method); E04V CF (allows bounds to be placed on param eters; requires first order derivatives); C h apter 6, E04LA F (an easy-to-use modified N ew ton algorithm for bounded parameters); C h ap ter 7, E 04V D F (requires specification of first order derivatives). M any statistical packages now provide the same facility. See for exam ple the F IT N O N L IN E A R directive in G E N ST A T , and procedures AR and 3R in B M D P . P rocedure N L IN in SAS fits non-linear regression models. As an illustration, procedure 3R in

382

C O M P U T IN G

B M D P uses a m odified G a u ss-N e w to n m ethod, b u t in com m on with m any procedures evaluates the required derivatives num erically, so th a t the user only has to supply the form o f the function to be optim ized. The package M L P is discussed in Ross (1990) as well as in the M L P m anual. In addition to providing a range of optim ization tools, it has a m odule specifically designed for q u an tal response data, and which we shall now outline.

MLP The F IT P R O B IT m odule has 10 options. It is possible to fit probit, logit an d com plem entary lo g -lo g links, to test for parallelism for several lines, fit m odels with n atu ral m ortality a n d /o r im m unity, fit a m odel to W adley’s problem (with an im m unity option), fit a m odel with tw o covariates, fit a m odel w ith a m ixture threshold distribution (cf. Exercise 4.28), and fit a d ilution series m odel (cf. Exercise 2.37). U sers are given the o p tio n of scaling stan d ard errors using a hetero­ geneity factor. D etails of M L P are available from N A G (see below). SAS W e describe the facilities available in V ersion 6.03 and 6.04. P robit, logit and com plem entary lo g -lo g m odels m ay be fitted (with or w ithout n a tu ra l m ortality) using the procedure PR O B IT . There is au to m atic scaling of errors and covariances by the heterogeneity factor if the m odel fit is poor. Logistic regression is possible in procedure F U N C A T (Version 6.03) and procedure L O G IS T (Version 6.04). In the latter case the link function m ay be either probit, logit o r com plem entary lo g -lo g . T here is m ention of F U N C A T and L O G IS T in C h ap ter 2, an d P R O B IT in C h ap ter 6. The C A T M O D procedure can also be used for logistic regression.

SPSS-X W e describe the facilities available in the th ird edition of the m anual, published in 1988. The key procedure here is P R O B IT . By default a p ro b it analysis will be carried out, b u t logit analysis, extending to logistic regression, is possible. It is possible to test for parallelism , and the N A T R E S subcom m and allows for n a tu ra l response.

A P P E N D IX E

383

BMDP W e describe the facilities available in the 1990 m anual. This package provides a num ber of alternative procedures which m ay be used. The logit m odel m ay be fitted by procedure LE and as a non-linear regression by procedure 3R, which uses the G a u ss-N e w to n m ethod. It m ay also be fitted within LR which performs logistic regression with stepwise selection of variables. C ase-control studies (section 1.7) m ay also be analysed by LR. P rocedure AR is specifically designated for q u an tal bioassay, and provides m easures of potency and relative potency. P rocedure LE uses N ew to n -R a p h so n , and it is also possible to fit an ordered m ultinom ial logistic model. A lternatively, for the latter one m ay use procedure PR (section 3.5). P rocedure 3R is referenced in C h ap ter 1; Procedure PL R , refer­ eed in C hapters 2 and 4, was an earlier version of LR. As one test of fit of the logit m odel it used a score test (Brown, 1982) w ithin the Prentice (1976b) family of m odels - see C h ap ter 4 for discussion. An advocate for the use of B M D P is Cox (1987, 1990) w ho fitted, for example, the hockey-stick m odel and a logit m odel with n atu ral response, using procedure 3R. The possibility of interfacing with F O R T R A N program s greatly increases the flexibility and appeal of B M D P and oth er packages with the same facility.

GENSTAT5 The possibilities w ithin G E N S T A T 5, as described by Payne et al. (1987), are greatly enhanced by the P rocedure L ibrary. R elevant procedures from the Release 2.2 are described below. F itting generalized linear m odels in G E N S T A T 5 is done through the M O D E L directive (Payne et al., 1987, p. 350). F o r example: M O D E L [D IS T R IB U T IO N = Binom ial; L IN K = p robit]. If the L IN K optio n is n o t used then the default is the canonical link. Exam ple 8.4.2 of Payne et a l (1987, p. 357) fits tw o pro b it lines; Exam ple 8.4.1 of Payne et al. (1987, p. 354) fits a set of serial dilution data. D etails of the P rocedure library can be o btained on-line from within G E N S T A T by typing, e.g.: L IB IN F O R M [P R IN T = contents, index, modules, errors]

384

C O M P U T IN G

F o r any procedure a description follows (illustrated for procedure FIELLER ): L IB IN F O R M [F IE L L E R ], and an exam ple can be o btained by typing: L IB H E L P ‘L IB E X A M P L E ’; exam ple = %Ex ##% Ex. O f relevance are procedures: F IE L L E R , G L M , O R D IN A L L O G IS T IC , P R O B IT A N A L Y SIS an d W A D LEY . In F IE L L E R either a logit, p ro b it or com plem entary lo g -lo g link m ay be specified and E D 100p values together with Fieller confidence intervals can be obtained. Relative potencies can also be obtained. In O R D IN A L L O G IS T IC the p ro p o rtio n a l odds m odel m entioned in section 3.5 is fitted to tw o-w ay contingency table d a ta with ordered colum ns. (Just the logit link is available.) Procedure W A D L E Y provides the analysis for W adley’s problem , with the options of logit, probit, C auchit or com plem entary lo g -lo g links. D istrib u ­ tional form s available are Poisson o r negative-binom ial, or a quasi­ likelihood app roach m ay be used in either a negative-binom ial or a scaled Poisson form. A test for parallelism is included (Smith, 1991). Procedure PR O B IT A N A L Y SIS also provides probit, logit and com plem entary lo g -lo g transform ations. N atu ra l m ortality and im m unity m ay be included in the m odel. M odels for different substances tested m ay be fitted and com pared. The m odels are fitted using the F IT N O N L IN E A R directive. In this book we have referenced G L IM m acros w hen appropriate. How ever, the same analyses m ay be p rogram m ed in G E N ST A T , m aking use of the G L M procedure, which allows specification of n o n -stan d ard link functions an d distributions. M ore com plex appli­ cations (see for exam ple Jansen, 1988, an d R idout, 1992) reveal the flexibility of the package.

GLIM W e have discussed G L IM in A ppendix B. As we have seen th ro u g h ­ o u t the book, it provides a flexible tool for the analysis of q u an tal response data. As well as being straightforw ard to use for the stan d ard links an d for m aking com parisons (C hapter 2), a m acro

APPENDIX E

385

exists (Baker, 1980) for conversational p ro b it analysis. G L IM m ay be used to fit m ixtures and to m odel m isreporting (C hapter 3). M acros exist for fitting a range of extended m odels, and it is relatively easy to carry out score tests and goodness-of-link tests of m odel fit (C hapter 4). G L IM m acros now exist for various aspects of W adley’s problem (C hapters 4 and 6), and for describing over-dispersion (C hapter 6). D iagnostics (C hapter 3; see C ollett and Roger, 1988) and likelihood-ratio confidence intervals (C hapter 2) can be produced in G L IM . A general m acro has been provided by V anderhoeft (1985) for carrying out the delta-m ethod (Appendix A). G L IM m acros exist for aspects of survival analysis (C hapter 5) and case-control studies (Gilchrist, 1985; W hitehead, 1983, C h ap ter 1). Release 3.77 and later releases of G L IM contain the G L IM m acro library, which is a com pilation of a num ber of com m only used m acros. To o b tain the contents of the library, enter G L IM and type: $E C H O S IN P U T % PL C 80 IN F O $E C H O T here is a range of m acros for generalized linear models, which produce diagnostics such as leverage values, deviance residuals, C ook distance, etc. (C hapter 3). P articularly useful is the index of Reese (1989), which references n o t only articles appearing in the G L IM N ew sletter, b u t also the G L IM m acro library and G L IM conference proceedings, as well as articles involving G L IM which have appeared in the Journals o f the Royal Statistical Society, The Statistician and Biometrika.

Other programs and packages A range of dedicated program s and packages exist for the analysis of q u an tal response data. Two exam ples are P O L O (Russell et al, 1977) and P R O D O S (Ihm et al., 1987). Tw o exam ples of packages w ritten specifically for use on IB M personal com puters or com pati­ bles, are P C P R O B IT (W alsh, 1987) and Q U A D (M organ et al, 1989). Special features of Q U A D are th a t it fits sym m etric and asym m etric extended m odels (C hapter 4) an d presents a range of diagnostics (C hapter 3). F O R T R A N sub-routines exist for fitting the beta-binom ial d istribution (Smith, 1983) and for calculating the ABERS estim ate (Cran, 1980).

386

C O M P U T IN G

A program for evaluating trim m ed S p e a rm a n -K a rb e r estim ates is available, for running on an IB M PC, from the C enter for W ater Q uality M odeling, US E nvironm ental P ro tection Agency, E nviron­ m ental R esearch L ab o rato ry , College S tation R oad, A thens, G eorgia 30613, USA.

A P P E N D IX F

Useful addresses

U p -to -d ate inform ation on the m ajor packages can be obtained from the following addresses: BMDP

B M D P Statistical Software Inc., 1440 Sepulreda Boulevard, Suite 316, Los Angeles, CA 90025, USA. B M D P Statistical Software, C ork T echnology P ark, M odel F arm R oad, C ork, Ireland.

SPSS

M arketing D epartm ent, SPSS Inc., 444 N o rth M ichigan Avenue, C hicago, IL 60611, USA. SPSS E urope B.V., P.O . Box 115, 4200 AC G orinchem , The N etherlands.

SAS

SAS In stitute Inc., Box 8000, Cary, N C 27511-8000, USA.

G E N ST A T , G L IM , M L P , N A G N um erical A lgorithm s G ro u p Ltd., M ayfield H ouse, 256 B anbury Road, O xford, OX2 7D E, U K .

U S E F U L AD DRESSES

N um erical A lgorithm s G ro u p Inc., 1101 31st Street, Suite 100, D ow ners G rove, IL 60515-1263, USA. C eanet P ty Ltd., 4th F loor, 56 Berry Street, N o rth Sydney, 2060 N SW , A ustralia.

A P P E N D IX G

Solutions and comments for selected exercises

C hapter 1 1.1

(a)

Fryer and Pethybridge (1975) rep o rt the following d a ta describ­ ing 220 children b o rn in the English counties of D evon and Som erset in 1965.

Birth weight in ounces 34.5-38.5 38.5-42.5 42.5-46.5 46.5-50.5 50.5-54.5 54.5-58.5 58.5-62.5

No. o f infants

No. o f perinatal deaths in the group

18 24 21 32 39 43 46

8 18 13 19 23 15 12

(b) The d a ta overleaf are taken from Ashford and Sm ith (1965). H ere A and B correspond to independent assessm ents of two radiologists. (c) Age versus decayed teeth. See also Exercise 1.14. N ote: exam ples (a) and (b) above are taken from H u b ert (1980), which is a rich source of q u an tal response d a ta sets.

390

S O L U T IO N S A N D C O M M E N T S FOR SELECTED EXERCISES

No. o f years spent working as a coal miner

No. o f miners 43 29 27 50 24 23 12 7 6

2.25 7.0 12.0 17.0 22.0 27.0 32.0 37.0 42.8 1.2

No. o f miners judged to be suffering f rom pneumoconiosis A B 0 4 6 22 17 14 5 5 6

0 3 6 24 16 16 5 5 6

W e m ay write the T aylor series expansion as: MX) = m

+

+

(x - g)nff)

0 { ( x - p ) 2}

F o r small c

k= 0

where the {G k} satisfy the recursion

G0 = l, Gk =

1 ( — + k — l\< 5 kfi \ S J

for k ^ 1

F o r b o th S ^ t ) and S 2{t), the su m m ation was term inated if an incom p­ lete gam m a integral was found to be less th a n 1 0 " 15, equivalent to zero for the degree of accuracy of the com puter used. A part from this feature, sum m ations proceeded for a m inim um of u term s, subsequent term s being added until a new term was found to be less th an e, w hen the series was truncated. A useful com bination is: u = 20 and e = 10“ 10. 5.11 F irst we describe how to fit the m ulti-hit m odel w ithout obser­ vations recorded over time. If there is no discussion of how long a study was run, then the d u ra tio n of the study is subsum ed und er the rate p aram eter £, and we w rite x h l e Xdx _ ^ (CdiYe cdl

Pr(response to dose d() = P,(/i, 0 = o

(fc -1 )!

The usual form at for q u an tal response d a ta has k groups of indivi­ duals exposed to doses { d j, w ith rt responding o u t of the w* exposed to dose dh for 1 ^ i ^ k. S tan d ard m axim um -likelihood estim ation of (h, Q is then as follows: assum ptions of independence produce the pro d u ct binom ial form for the likelihood: nh, o = h

or~n

° ' ,(1 -

F o r simplicity we w rite P t = P f(/z, £) an d pt = r *//?,•. T he log-likelihood has the form: k

k

l = l(h, C) = co n stan t + £ r ( log P, + £ (n, - r,)log (1 - P,) i= 1 i=l Iterative m axim um -likelihood by the m ethod-of-scoring is then

A P P E N D IX G

415

stan d ard , and requires the following terms: let q denote either h or (: X 8Pi n,(p; — P ,) dr,

k id n P ^ -P ^

_OT J

rAi-n)

4 e\



^

)

1__ y

where \j/( ) denotes the di-gam m a function. In this w ork we could relax the restriction of h to positive integer values. H ow ever, analysis is clearly simplified with h restricted to positive integer values. Because of the n atu re of the m odel we m ight expect the likelihood surface to exhibit a ridge for hocC, and the estim ators h and £ correspondingly to be highly positively correlated. The sam e ap p ro ach is ad o p ted for d a ta recorded over time, using the expressions given in section 5.4. 5.12 As the p ro p o rtio n of individuals surviving decreases (from 0.96 to 0.21), then the selected s value also decreases. As s decreases, m ore em phasis is placed on pm in iji(s\ and so as pm becomes m ore ‘inform ative’, the m ethod responds by reducing the selected s value. 5.14 The local m axim a of curves are particularly intriguing and require explanation. If we take the curve, h = 8 as an illustration then sim ulated responses o u t of 10000 are show n below for £ = 1 and £ = 1.6.

416

SO L U T IO N S A N D C O M M EN TS FOR SELECTED EXERCISES

Dose 0.25 1 2 3 4 5

0 0 0 0 0

C= 1

£=1.6

Time

Time

1 0 11 119 499 1337

2.0 17 515 2668 5456 7819

1

0.25 0 0 0 0 16

5 188 1127 3129 5466

2.0 160 3224 7323 9371 9887

In b o th cases the first sam pling tim e provides little inform ation, b u t w hat is critical is the spread of responses for the endpoint. W hen £ = 1.6, this spread covers a w ider range th an w hen £ = 1, em phasizing the substantial am o u n t of inform ation present a t the end p o in t w hen £ = 1.6. W e w ould expect this feature to be m ost pronounced for estim ation of the scale p aram eter £, as indeed is seen to be the case. 5.15 (i) The likelihood is given by: Locn

| £ (F (tj; dd - F ( t j . i ; 4 ) ) " J}{ 1 - F (tm;

i =1L U = i

A log-logistic survival d istrib u tio n w ould suffice for the females, b u t n o t the males. (ii) D eviance differences due to fitting = /? are: (40.85 — 37.75) = 3.10 for males, an d (49.92 —41.94) = 7.98 for females, b o th on 3 d.f. F o r m ales we can take = constant. F o r females the result is ju st significant at the 5% level, b u t P ack and M organ (1990a) argue in favour of setting /?* = co n stan t there too. F itting m odel (b) w ith a com m on value for a 2 provides a test of parallelism , of the logit lines. The resulting test-statistic is 1.25, referred to X2V W e conclude th a t the d a ta are consistent w ith a com m on value of a 29 b u t th a t otx an d the survival distributions are different for the tw o sexes. (iii) T he curves of Figure 5.1 possess tw o asym ptotes. This is a consequence of the m odel being a m ixture m odel, an d of the time

417

A P P E N D IX G

distribution being taken independent of dose. F o r example, a very low dose, say db can n o t produce an expected p ro p o rtio n responding greater th an a (Jt), how ever long the experim ent runs. Also, w hatever the dose, the tim e to response follows a fixed distribution. If t is fixed, increasing the dose will not, ultim ately, result in a probability p of response, for any p. Figure 5.7 illustrates a clear difference betw een m ixture and non-m ixture models. 5.16 M odel (d) does im prove the fit to the endpoint m ortalities, b u t the overall im provem ent in the m odel is n o t significant. 5.18 The m odel appears to be possibly com patible with a W eibull m odel (A = 0). A m ixture m odel m ay be appropriate, b u t we w ould need longer experim ents at the low doses before this can be discussed further. O stw ald’s equ atio n does n o t hold. 5.19 If either d{ or tj is fixed then in an odds ratio the com ponent involving the fixed term cancels. F o r example,

{Pijji 1 - PijM P u A l - Pin)} = (tjJtjf irrespective of the value of db 5.21 C um ulative m ortalities ap p ear to level out with respect to t and then increase again. Larvae which escape initial infection m ay be infected by viruses which are released by the explosion of dead larvae th a t die after initial infection. C hapter 6 6.1

P r (X j = Xj\nj) = ( ^ j

B(oc + xj9rij+ /J- Xj)

W T(a + Xj)r(rij + /? —Xj) T(a + /?)

Xj

n n j+ p + a )

rij-Xj

m

m

n (a+ Xj - o ni= 1 (nj + p ~ xj - o

i= 1

n (rij+ p + a - 0 i= 1

418

SO L U T IO N S A N D C O M M E N T S FOR SELECTED EXERCISES

o (If Xj = 0 o r Xj = rij, the p ro d u ct term Y[ *s taken as unity.) i=

1

Divide n u m erato r and d en o m in ato r by (a + /3)nj to give: a

/ X; — i \ I V t

\

11

P

\

I rij — Xj — i

n

M l Xj - 1

II

Kj\r =

(a + j8).

rij - X j - 1

[]

(F + r d )

r=0

0

(1 -

/ i + rd)

X /

J

na+rfl) r= 0

6.2

U se induction!

6.3

Let Z jk = 1 if the fcth foetus responds = 0 if it does not, k = 1 , 2 , . . . , n7-. U n d er the beta-binom ial m odel Pr(Zj k = l ) = ii Pr(Zjk = 0 ) = l - f i H ence E (Zjk) = fi Yar{Zjk\nj) =

- fi)

But Vsi(X]\nj) = njfi(l - fi){\ + C(nj ~ 1)}

Z

and because k

z jk, we have

=i

Var(X,.|n,) = £ V ar(Zjk) + k= 1

£

£

C ovfZ ;* Z jk. \ n,)

k = l k ' = l k'±k

The first term is ju st njjn{ 1 —/1), while the second (assum ing the X jk are equicorrelated) is n fr ij— l)C ov(Z jk, ■Zjk' I "j)- E quating the two expressions for the variance we o b tain C o \{ Z jk, Z jk.| Hj) = ji(\ - n)p

419

A P P E N D IX G

so th a t finally we have C o rr (Zjk, Z jk.) = p K now ledge th at z * = l provides inform ation on p: we now know th a t p > 0, with clear im plications for the distribution of Z jk>. 6.4

See Exercise 6.15 and its solution.

6.5

W e can solve directly for p. Iteratio n is then needed for 0.

6.6

Helpful com m ents are given in B rooks (1984).

6.7 The m odel can n o t account for the small num ber of litters with high m ortality, b u t the qualitative fit is generally good. U sing a M onte C arlo ap proach, P ack (1986a) found a binom ial m ixture provided a b etter fit to the H asem an and Soares data. Cf. Exercises 6.25 and 6.27. F o r the A eschbacher et al. d a ta set the beta-binom ial and m ixture m odels provide sim ilar fits. 6.8 W illiam s (1988b) suggests using a p erm u tation test, for exam ple to test for a dose response, or for a m onotonic regression. F o r further details, see W illiam s (1988d). 6.9 The d a ta are too dispersed. A part from the clear outlying litters with high m ortality, the zero colum n is frequently under-estim ated. 6.10 The beta-binom ial log-likelihood is given by equation (6.4). F o r a score test of 0 = 0 we require (A ppendix D), 8L/d6 and the term s of the expected H essian evaluated at 0 = 0, p = p (the m axim um -likeihood estim ate of p when 0 = 0). The required derivatives are given in section 6.1.2. W hen 0 = 0 we obtain p = x./n., the ordinary binom ial estim ate. W ith 0 = 0 and p = p we obtain: m = 1 i \ xj{Xj - 1) | (nj - xj)(nj - xj - 1) ^ ^ t) 80

2 J= 1 1

(1 - p i )

fl

W hen 0 = 0, we can show th a t E ~ d 2l d62

d2l dpdO

1 1

■0, and

420

SOLUTIONS AND COMMENTS FOR SELECTED EXERCISES

T he score statistic therefore simplifies to: 82r

'

SB2

Sim plification of ~ gives

- £ n p i j - l ) - 2 £ nj j =

1

7=1

resulting in

pi/

_

E

2 _ j=

i

A ( ! - A )

j=i 1/2

2 £ n/ " j - !) j=i which is the statistic Z of section 6.1.3. 6.12

0 < p o o c > 1; 0 < (1 — p ) o p > 1.

Bell-shaped o a, /? > 1. 6.15

E [p ] = p , hence set p = p. Set -- P ] *i

Hence

J

= (“ -P + P -P \ n i

421

A P P E N D IX G

Set S = E [S ] to give: g=^

1 ~ P ) U + P(n»'~ !)} - w . E [ ( p - p ) 2]

S = p(l - / * ) £ — + p/i(l - p ) Z v v / l i »i i \

»i/

w2 - w _1X — M1-p ){ i +p(«i-1)}
(n — p), from the first of equations 6.10, set

fc P = { * 2 - (« - p)}/ X {(«i - 1)(1 - vtf,)} Recalculate weights: = {1 + p(n* — l) } - 1 , estim ate fi iteratively, and recalculate X 2. If X 2 « (n — p) accept p. If not, re-estim ate p from the second of equations (6.10)- t h i s explains why we do n o t cancel wf and p in this equation. E stim ate /?, and continue until X 2 % (n — p). F o r the link with K leinm an’s procedure, see P ack (1986a). 6.18 F ro m A ppendix A, V #(p ;)) ~ {^'(p)}2+ ! ( » - ' • . ) [

I i= 1

^ U = l

i= s

)

and so ir ,= i= 1

i(n -rd i= s

i.e. D ragsted-B ehrens and R eed-M uench estim ates coincide. If no such s exists, 0D follows from linear in terp o lation betw een the two doses w ith values P s which span 0.5. In general we can expect 0D and 0R to be similar. A sym ptotic theory is equivalent. 7.8 D a ta set 2 has ju st 4 doses, and so is difficult to discuss in this context. D a ta set 3 has 9 doses, b u t only 6 subjects per dose. D eviances for the various fitted m odels (fitted to doses) are (G oedhart, 1985): A ra n d a -O rd az : com plem entary log-log: logit: cubic logistic: B ox-C ox:

6.02 8.46 6.37 6.34 5.99

D egrees of freedom = 7. W e have X = — 0.09, which suggests taking logarithm s before fitting a sym m etric model. F o r fitting to u n tran s­ form ed doses, we w ould expect an asym m etric m odel to do better th a n a sym m etric one. This is b o rn e o u t by the above results. H ow ever, differences are small, due to the overall small sam ple size. The p articu lar skewness of the com plem entary lo g -lo g m odel is n o t ap p ro p riate for the data. 7.10 F o r b o th logit an d p ro b it m odels only the highest age-group satisfies the criterion for trim m ing. Below we give the results from

429

A P P E N D IX G

m axim um -likelihood fitting. The effect of om itting the highest age-group is negligible. Cf. discussion of Exercise 1.13. F o r the p robit m odel the deviance reduction is from 22.89 to 22.85. F o r the logit m odel the deviance reduction is from 26.70 to 25.47. Degrees of freedom d ro p from 23 to 22. In the latter case, for illustration, param eter estim ates change from: a = - 21.23(0.771)

6t= —21.10(0.781)

to:

0 = 1.622(0.060)

0 = 1.632(0.059) 7.11

We o btain e as the ratio: e=

1 m

1 W )

using the expressions of (7.5) and (7.6), to give e 1=

f 2(x)[F(x){\ - F ( x ) } ] 00

F(x){ 1 — F(x)}dx 0

= E [ F ( X ) { 1 - F ( X ) } //( X )] E

'd x

f(X )

F (X ){\-F (X )}_ \

where the expectations are taken w.r.t the r.v. X , with probability density function f( x ) . Now , by the C au ch y -S ch w artz inequality, AX)

F {X ){\-F {X )}]

^ 1/E

F (X ){ 1 -F (X )} ~ AX)

Hence e ~ l ^ 1. W e have equality if and only if m F (x){l-F (x)} is constant. In teg ratio n gives F(x) as logistic. It is easily verified th at if F(x) is logistic, e = 1. 7.12

F(x){ 1 — F(x)} dx is finite, but

/ 2M dx x 0 F(x){ \ — F(x)} so th a t e = 0.

dx x

l o g ( l - ^ - (x/^)

430

7.13

SO L U T IO N S A N D C O M M EN TS FOR SELECTED EXERCISES

In the norm al case we require . **>.

1"°

- dx and

{x){l-$>(x)}dx

N um erical integration (G ovindarajulu, 1988, p. 50) gives f*

J _ „ { ! -«(x)>

r

^ * - 0 .9 0 3

J_„ 0, and for x > z, / '( x ) < 0. The value x = z therefore corresponds to the unique m ode of f(x ). C hapter 8 8.1

Let P = P(dx) and Q = P(d2). V " 11=

(PO)2 4

- d 2)2

Substitute,

d1 = - { l o g ( P / 0 - a }

d i = —{lo g (Q /P) — a} to give | P ( 2 1 o g (P /0 2

(m aximizing |V 11 is equivalent to m inim izing |V |).

8.2

y _

2

d.\+d.\

~ P Q id .-d ^ l-id .+ d ,)

— (dl + d 2) 2

A P P E N D IX G

433

P2

d2 + d2,

2 m i o ^ P / W 2 l - ( d , + d 2\

— (di + d2) 2

Setting d /d P = 0 gives: (a2 + /?2) {2 —log ( P / Q ) ( P - Q ) } _ q {log ( P / 0 } 3 8.3

The eigenvalues of V are given by:

A=

a2+ P 2 + ;io g (P /0 } 2 + ([a2 + P2 + {log(P/6)}2] 2 - {2/flog(P/0}2)1/2 2PQ{\og(P/Q)}2 R osenberger and K alish (1981) used num erical optim ization to m inim ize the larger eigenvalue. 8.4

A pplication of the delta-m ethod (Appendix A) gives:

V(P(d0)) =

where n is the n um ber of individuals per dose, P 0 = P(d0), n + 00-1. R osenberger and K alish (1981) found the optim um design to be P = 0.768, from using num erical analysis. As with D-optim ality, we are here ju st dealing with a function of P; (a, P) are n o t otherw ise involved. This is n o t the case for A and E optim al designs, which are therefore less desirable. Rosenberger and K alish (1981) define robustness efficiencies for the D and G designs. N either is ruled o u t by evaluating the efficiency values. A good com prom ise design has P = 0.8. 8.5 If 6j = 0A, small inaccuracies in Pj still produce high efficiency values. O verestim ation of P is less of a problem th an underestim ation. However, the evidence for the case of a m oderate departure from

434

SO L U T IO N S A N D C O M M EN TS FOR SELECTED EXERCISES

6j = 04, which is perhaps the likely situation, suggests it is better to have < p A. 8.6

W e know , from section 2.2, th a t dPi

- eI K I - Sf r P t ^i - P L' d t 2] and dPidPi d2l

=

_

y ni[~fa~df}

ja d ji_

i=1 P f r - P d

Also, dP

dP

da

dp

H ence Fisher (expected) inform ation m atrix is: X niwi

T niwidi

Y ntwidi

X n:widi

P) —

leading to:

/?2|/(a,/?)| = j X

n iw i

U= 1

X

X

n i w i(Pd oo, the individuals should be placed progressively closer to the (unknow n) E D 50. 8.10 F o r m ^ 100, the D -optim al design perform s well for b o th 9 and P separately. As m -> oo, the efficiency show n of the E D 50 design decreases as it moves the design points together. (Cf. the tw o-point designs of Exercise 8.9.) The suggested three-point design is close to the D -optim al design of m ^ 100, and has the advantage of simplicity - see K alish (1990) for m ore details. 8.11 H eights for which r j n > 0.5 are likely to be far higher than the E D iq. K ershaw and M cR ae (1985) recom m end setting 0.1 < n2/n < 0 .5 and r ijn /?i, suggesting atropine has a larger effect th at pralidioxine chloride. Squared term s, with m ainly negative coefficients, correspond to toxicity. Pralidioxine chloride m ay be the m ore toxic. 8.28

The likelihood, L, is given by:

L = constant + r n log{0(c)} + r 12lo g {l - 0 ( c ) } + r21 log{0(c - d)} + L22 log {1 ~ 0 (c — d)} 8L dd~

(c — d){r2 0 (c — d) — r21} 0(c-{c){rl l ~ r l .0(c)} ^ 50s120 Receiver Operating Characteristic curves 391 Recovery o f insects 9 Reed - Muench estimator xvi, 309, 334,428 Relative efficiencies for designs 345 Relative potency 73, 332, 383, 384 constant 375 Relative toxicity, gauged by EDso 26 Reparameterization 4 4 ,2 0 0 o f beta-binomial distribution 240 Repeated measures data 2 22,452 Repeated measures over time 129 Replication, and measuring dispersion 259 Reproductive performance 288 Resistant approach to making comparisons 74 Resistant model fitting 143 Resources, division o f between control and experimental animals 361 Response surface methodology 361 R-estimator 319 Retrospective data 282, 302,425 Ridge regression 287 Robbins - Monro procedure 354 ,3 6 0 delayed 365 modified 366 Robustness 304, 317, 3 3 3,432 o f design 343 to variance function specification 262

509 Rotenone 90, 330 Round-off error 48 Routine drug testing 87, 303, 340 Safe dose evaluation 173 non-parametric approach 175 Sample median 322 SAS 29 ,4 4 , 54, 382 FUNCAT, LOGIST 69, 382 NLIN381 PROBIT 259, 382 Saturated model 259,265, 294, 374 Scaled deviance 374 Scaled Poisson distribution 294 Scale parameter in GLIM 373 Score function,usedinR-estimation 322 Score test 77, 158, 1 5 9,265,286, 300,371,378,379,383,385,399 added variable plot for 119 within beta-binomial model 419 to give Cochran - Armitage test 7 5 ,9 0 conditioning brought about by 178 in GLIM 72, 380 influence in 118 for synergy 127, 142 Tarone’s test as 246 Sea urchin eggs 136 Secant method 64 Second-order approximation for estimating variance 345 Semi-parametric approach 211 Sensitivity curve xvii, 323, 324,431 Sensitization o f skin 5 Sequential design 28 two-stage 363 Sequential methods 340,465 comparison o f 359 Sequential optimization 357 Serial dilution data 383, see also Dilution assay Serological data 16 Sex difference in susceptibility to insecticides 213

Sex proportions 401 Shifted Weibull distribution 179 Short-term survivor 135 Side effects 2 Sigmoidal, cdf 329 constraint 328, 332, 338 threshold distribution 339 Signal-detection model 13, 23, 80, 1 41 ,3 6 8 ,4 0 2 Sign test 322 Sign test score function 322 Simple similar action 126 Simplex method 76, 99, 133, 212, 216, 222, 2 5 3 ,3 4 8 ,3 6 8 ,3 8 1 out o f range 268 variants on 470 Sine curve distribution 27 Single step procedures 49,101,105,371, see also One-step iteration Site differences 133,400 Skewed-logit distribution 186, 359, 360 Slash distribution 326,430 Slope parameter 42 maximum likelihood greater than minimum logit chi-square 395 Small-sample behaviour 327 Smoking, effect on fecundability 283 Smooth test 166 Smoothing in binary regression 131 Smoothing proportions before model-fitting 58 Snails 190 Spatial dependence 275 Spearman - Karber estimator xvi, 77, 306,317, 3 3 3 ,3 4 6 ,4 2 6 , 430 effect o f ABERS on 334 efficiency 318, 335, 336 influence curve 337 lack o f robustness 325 as L-estimator 320 as M-estimator 320 relationship with logit model 309 trimmed 311

510 Species barrier 5 Speed o f response 8 Spermarche 35 Spraying o f insecticide 8 SPSS/PC+ 381 SPSS, PROBIT 381, 382 NATRES 382 Stable parameter 4 4 ,2 4 0 Starting values for iteration 48 for method o f scoring, use o f minimum logit chi-square 59 Start-up designs 360,366 Stepwise variable selection 383 Sterile couples 284,425 Stieltje’s integral approximation 204 Stochastic approximation 354 Stochastic models 191 Stochastic Newton - Raphson method 356 Sufficient statistic 78,374, 394 Supercritical linear birth and death process 37 Superparasitism 136 Survival analysis 17, 119, 129,144, 147, 184, 191,21 1 ,3 8 5 Survival data, analysis in GLIM 377 Survivor function 1 3 2 ,2 0 1 ,2 0 6 Susceptibility ratio 411 Symmetrizing the data 310 Symmetry constraint 329 Symmetric distribution 338 Synergy 16, 126 Systematic component in GLMs 372 Systemic poisons 42 System vector in GLIM 374 Target organ 131 Tarone’s test 246 as score test 292 Taylor series 29, 61, 111, 171, 370, 389,412 t distributions 56 Temporal dependence 275 Teratogenicity 1

Tetratology data 5 ,2 3 4 ,2 4 8 Termination criterion 371 Test for non-additivity 166 Three-point design 346, 364,435 Three-point sequential design 361 Threshold model 2 2 ,2 3 bivariate 39, 127, 393 extension to time-to-response data 212 for microbial infection 37 see also Distribution Time-to-response 2 3 ,1 3 5 , 359, 366 Time-to-response data, as multivariate 192 Topical application o f insecticide 8 Tolerance 22 Tolerance distribution 142, see also Distribution Toxic effects 361, 368 Toxicity class 4 ,1 2 3 Toxicity rating 4 Toxicology 2 Toxic response data 32 Transect sampling 393 Transformation o f parameters 77 to improve asymptotic normal approximation 84 Transforming dose scale 160 Trend in proportions, testing for 74 Tribolium castaneum 8, 9 Trichotomous responses 119 Trimming 307,428 Trimmed logit 314,335 Trimmed Spearman - Karber xvi, 26, 233, 3 0 6 ,3 1 1 ,3 3 5 ,3 8 6 Trinomial distribution 2 8 ,1 0 0 Truncation parameters 369 bounds 356 procedure 355 Mest 391 Tukey biweight estimator 321, 326, 431 influence curve robust 325, 338 Tumour formation 23 incidence 464

511 Turning points, in sequential data 349 Two-hit model 174, see also Multi-hit model Two-point optimal design 342, 363, 369 TWOSAMPLE 391 Two-sample /-test 391 Two-stage design 344, 345 Type III Pareto distribution 197 UDTR rule 353 Unconstrained optimization 45 Under-dispersion 241, 281, 302,401, 423 in multiplicative binomial model 299 Unequal variances 20 Uniform distribution 151,152,181 Uniform prior distribution 348 Unimodal distributions 329, 338 Unique optimum, necessary and sufficient condition for 44 Up-and-down experiment 348, 352, 367 Urban’s curve 28 Urn-model representations 281,300 reduction to beta-binomial 424 Using tables 53 Variable selection in logistic regression 72 Variable step-sizes in sequential analysis 354

Variance components 276 Vaso-constriction o f skin 68 Vigilance task 14 Virtually safe doses 27, 174, see also Low dose extrapolation Virus, infection by single particle, 23 Voting intentions, 89 Wadley’s problem 105,185, 382, 384, 385 with overdispersion 263 Wald test 3 7 9 ,4 0 0 ,4 2 4 ,4 8 0 Wasp 136,401 Weibit 168 Weibull distribution 133,135, 174, 186, 1 8 9 ,2 1 5 ,2 1 7 , 377, 405, 409 ,4 1 7 matched with multi-hit model 410 normal approximation to 169,179 shifted 179 Weighted least squares 47, 72, 128, 219, see also IRLS Weighted linear regression 2 0 ,4 6 , 5 1 ,5 3 ,1 2 7 , 142 Wheeze data 38,1 2 8 , 142,404 Wilcoxon test 322 score function 322 Williams test 75 Wind tunnel experiments on flies 10 Woollen fabrics, prickle evaluation 9 Working logits 51, 53, 54, 80 Working probits 54 W u’s logit-MLE method 356