1,255 132 16MB
English Pages 298
llpPMCE
OURPR^E
m
fli
tt£i
Digitized by the Internet Archive in
2012
http://archive.org/details/biostatisticsintOOgold
'r
BIOSTATISTICS: An
Introductory Text
BIOSTATISTICS An Introductory
Text AVRAM GOLDSTEIN Professor of Pharmacology, Stanford University
THE MACMILLAN COMPANY,
NEW YORK
COLLIERMACMILLAN LIMITED, LONDON
©
COPYRIGHT, AVRAM GOLDSTEIN,
1964
No part of this book be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without permission in writing from the Publisher. All rights reserved.
may
Fifth Printing
1967
Library of Congress catalog card number: 6411036
The Macmillan Company,
New York
CollierMacmillan Canada, Ltd., Toronto, Ontario Printed in the United States of America
To dbg
Preface
book
This
traces
its
pharmacology course
origin to the
at
Medical School given more than ten years ago, where Douglas
and
I
undertook to teach medical students some principles of
Harvard S.
Riggs
biostatistics.
There was a short monograph, and also a laboratory exercise in tossing pennies and drawing colored shoebuttons out of jars.
A
few years
later
(Riggs having meanwhile taken the chair of pharmacology at Buffalo) I
obtained the cooperation of William E. Reynolds (then in the Depart
ment of Preventive Medicine of California) and together biostatistics course for
in a
few cases
much
at
Harvard, now associated with the University
we organized and taught
its
Training in
statistics has,
and
was being
in the years since, biostatistics
most
institutions.
of course, long been recognised as indispensable
and of psychology and other behavioral
sciences.
teaching experience with students of medicine and the biological
sciences has largely shaped the content
few students in these mastery of
its
fields will
mathematical
and format of
wish to learn
basis.
The
statistics
vide useful tools for their
where, and
He
how
life's
work. The
know how
book. Very
great majority, however, are well
know
it
will pro
shown when, general way why they
student must be
the tools can best be used, and in a
does not have to
this
through a systematic
disposed toward the subject at the outset because they
work.
required
Harvard medical students. At the same time (and
rightful place in the medical curriculum of
for students of biology,
My
first
earlier) regular instruction in this subject
established at other medical schools;
has taken
the
the tools were fashioned, nor even
the proofs that they can
conclusions to which
underlying
do what the
it
statisticians
or to understand
intelligently,
biostatistics
Preface
vii
To
use
claim for them.
use by others and the
its
of the rationale
leads, requires principally a grasp
applications and of the correct ways of formulating prob
its
lems and hypotheses for
statistical
analysis.
have tried to frame the
I
arguments, explanations, illustrations, examples, and problems in terms
common
to everyday laboratory or clinical experience.
meant to be sufficient
in itself for all possible needs.
prototypes of a great
include
and the tables
will satisfy every
many problems ordinary
The book
is
not
However, the examples
commonly
that
arise,
requirement. Thus the book
should continue to serve as a working manual for most procedures long after
has
it
introductory
fulfilled its role as
text.
The description of procedures and the construction of the tables proceed from two premises first, that biological experiments hardly ever
—
yield data that are
meaningful beyond three significant
that the conventional levels of significance
for
sufficient
all
ordinary purposes.
It
(0.05
has
figures,
and second,
and 0.01) are quite
therefore
been
possible,
numerous
simplifi
rule has been followed: If
an exact
especially in transcribing the various tables, to effect
cations and abbreviations. In rounding off
half
numbers the usual round
to be dropped,
is
half of
the
all
small, so
to the nearest even
numbers retained are
no systematic bias
slightly
results.
number. In the long run
too large, half slightly too
However,
of
in the tables
critical
values of various statistics, the following rule has been adopted: If any
be dropped, round in the conservative direction, upward or
digits are to
downward depending upon test
of significance
Worthy of
will
special
/test,
in their
and are
be very slightly more rigorous than intended.
mention
parametric procedures. the
Two
is
the inclusion of three very useful non
of these compare favorably
was
I
also
am
very
book originated
much indebted
good enough
text.
teaching experiments
to Drs. Riggs
and Reynolds. The improving
Dr. Rupert G. Miller, Associate Pro
fessor of Statistics at Stanford, offered
am
in the
to offer constructive suggestions for
an early draft of the present
I
with
more widespread adoption.
referred to,
and
in efficiency
far simpler to apply, yet there has been a surprising lag
Insofar as the ideas in this
latter
Thus an occasional
the particular statistics.
much
helpful criticism
and advice,
deeply appreciative of the patience and care with which he read
the manuscript. Dr. Helena C. Kraemer, Research Associate in Statistics, carried out a painstaking verification of the examples
and problems, and
PREFACE
viii
made
several
useful
criticisms.
Many
incorporated, but, of course, they share
may
remain.
I
owe thanks
Sumner M. Kalman,
also to
for their
my
of their suggestions have been
no
responsibility for errors that
colleagues, Drs. Lewis
Aronow and
comments on a preliminary draft, and for me from the obligations of co
their generosity in temporarily releasing
authorship of another text so that to the late Professor Sir
I
might complete
Ronald A.
this one. I
Fisher, F.R.S.,
am indebted
Cambridge and
Dr. Frank Yates, F.R.S., Rothamsted, also to Messrs. Oliver Ltd., Edinburgh, for permission to reprint Tables
from
their
book
secretarial
II,
I
wish to express
assistance
my
&
IV, XI, and
Statistical Tables for Biological, Agricultural
Research. Finally, for
I,
gratitude to Mrs.
to
Boyd
XXVII
and Medical
Ray
Jeffery
of the highest quality, without which the
preparation of this book would have been far
more
difficult.
Avram Goldstein
Contents
CHAPTER
I
The Logical Basis of Statistical Inference
CHAPTER
1
2
Data
34
Enumeration Data
93
Quantitative
CHAPTER CHAPTER
3
4
Correlation
Tables 1. Random
129
234
Digits
2.
Squares of Numbers
235
3.
Fourplace Logarithms
239
4.
Areas of the Normal Curve
241
5.
Critical Values of
242
6.
Factors (K) for Onesided
/
7.
Critical Values of the Vari
8.
Factors (k*) for the Student
9.
Critical values of
ized
10.
Critical in
F
x
2
11.
Confidence Limits for the Poisson Expectation
14.
Significance of an Observed
Difference Between
15.
Critical Values of
16.
Critical
257 in the
Twosample Rank Test Values
Sum
of
248
Smaller
Signedranks Test
(T) in the
262
Conversion of Percents to
18.
Working
263
Probits
251 252
258
the
17.
the Smaller Binomial
Binomial Confidence Limits
U
256
two
Poisson Variables
250
Numbers of Items
Category
254
13.
244
Range
Values of the Exponential Function, e~ x
243
Tolerance Limits
ance Ratio,
12.
and Weighting Coefficients Probits
263
267
Index ix
BIOSTATISTICS: An
Introductory Text
CHAPTER The Logical Basis of Statistical Inference
INTRODUCTION
A
good understanding of
in the training
medicine.
biostatistics has
become an
of students in every branch of the biological sciences and
The reason
tools for the design
is
that the
methods of
biostatistics are indispensable
and interpretation of experiments. The wise
gator draws a statistician into consultation to
ment a
is
make
well designed to answer the question at hand,
maximum
essential ingredient
of useful information with a
minimum
investi
sure that his experi
and that
it
will yield
expenditure of animals,
patients, or time. In evaluating data he again utilizes the statistician's skill
ings will permit, but still
and as general as the
so that his conclusions will be as definitive
no more
so.
find
many experiments
Unfortunately, too
being carried out and published are so poorly designed that they
cannot support any valid conclusions. The purpose of
this
book
is
pri
marily to outline the logical basis of the statistical approach to experi
mental problems, and the main features of those
commonly used
in biological
experimentation.
A
better
statistical
methods
working knowledge
of the procedures and their rationale on the part of every student of the biological sciences
and medicine can only be
acquaintance should lead not only to some
beneficial.
facility in
Such a preliminary
applying biostatistical
methods, but also to a better appreciation of their potential value, so that expert advice will be sought
The
full text is
before and
more
readily
intended for the reader
when
who
the occasion arises.
has never studied
whose mathematical training may be rather
scant.
statistics
Emphasis
is
placed upon the applications of biostatistics to real experimental problems I
THE LOGICAL BASIS OF STATISTICAL INFERENCE
2
The
of the kinds encountered in the laboratory or clinic.
each procedure
no attempt
is
is
made
to provide
some previous exposure to still
mathematical proofs. The student with or with a
statistics,
matical background, will find profit
rationale of
explained intuitively, or demonstrated empirically, but
much of the
more
sophisticated mathe
text too elementary, but
should
from studying the examples and working the problems. Numer
ous references are provided to sources of additional explanation, to
mathematical proofs, and to texts containing further
illustrative
examples.
All the necessary tables are at the back of the book.
SOME IMPORTANT ASPECTS OF EXPERIMENTAL DESIGN The
principal application of biostatistics
No
derived from experiments.
ment, the conclusions
may
matter
how
This
is
some fundamental
a subject with
many
of data
be false unless the experiments themselves
were properly designed and carried out. considering
in the analysis
is
elegant the statistical treat
well, therefore, to begin
It is
by
principles of biological experimentation.
ramifications,
and
special
should be consulted for fuller information. Here
we
works devoted
to
it
shall discuss briefly
some of the most important requirements of proper experimental design, which bear upon the validity of data to which statistical analysis is to be applied.
The problems
that mainly concern us have to
experimental manipulations upon interventions
biological
do with the
systems.
by the investigator are known as treatments.
produced by a treatment
We may
is
known
effects
of
Such deliberate
A
result
as an effect.
wish to find out whether or not a certain drug reduces the
concentration of glucose in the circulating blood. The treatment consists in
administering the drug under specified conditions of dosage, frequency,
duration,
and so on, to a group of
subjects.
The
effect
would be a measur
able lowering of blood glucose concentration.
We
might want to
know how
the leverpressing behavior of rats
affected by periodic food reinforcements.
The treatment
is
is
the specified
reinforcement schedule. The effect would be a measurable change in the rate or temporal pattern of lever presses.
We may
be interested in learning
how exposure
to heat influences the
subsequent germination of tomato seeds. The treatment
is
exposure of the
seeds to certain temperatures for specified periods of time.
The
effect
Some important aspects of experimental design
would be measurable
as a
change
3
percent of seeds which germinate,
in the
or in the average time to germination, or in
some
qualitative feature of the
germination process.
Since in
we wish
to
know what
effect a
treatment produces, the chief aim
planning and conducting an experiment
no factor other than the treatment
that
result.
This ideal
is
is
will
to ensure insofar as possible
contribute to the observed
almost never attainable, since extraneous influences are
nearly always present.
The
practical
aim
is
therefore to ensure that
treated subjects to
the
all
all
upon the
influences except for the treatment under test will act equally
and upon a comparable group of control subjects exposed
same conditions but not
to the treatment.
Let us examine a possible experimental design for investigating the effects
of a drug that might lower the blood glucose
subjects
is
assembled
convenient time
at a
blood samples are drawn. The drug samples are drawn again.
When
is
in
the
The group of morning and initial level.
then given, and an hour later blood
glucose concentrations are determined
they are found to be considerably lower in every postdrug sample than in the corresponding initial sample.
Can
it
be concluded that the drug
lowers blood glucose? Certainly not. In this case the fault in the experi
mental design
is
control group at
transparent— all subjects were treated and there was no all.
We
do not know what would have happened
blood glucose concentrations
hour
interval if the
in these
subjects over the
same one
drug had not been given.
above example an apparent
In the
same
to the
effect
(lowering of blood glucose)
was found, but because the experiment was uncontrolled there was no
way of deciding whether responsible. Suppose,
on the other hand,
that there
had been no change
blood glucose concentration. One might be tempted,
in
in that case, to
dismiss the possibility that the drug lowers blood glucose, but a
convince one that no such inference can be drawn.
reflection will
was
or not the treatment (drug administration)
drug not been administered, the blood glucose of possibly have increased during the
same period,
reduced what otherwise would have been a
all
little
Had
the
the subjects might
so that the drug effectively
much
higher concentration.
Thus, regardless of the outcome, no valid conclusion can be drawn from an uncontrolled experiment. Let us design.
On
the
now
The first
consider the following improvement in the experimental
subjects' cooperation
is
obtained on two successive mornings.
(control) morning, blood samples are
again one hour
later,
but no drug
is
administered.
On
drawn the next
initially
and
morning the
4
THE LOGICAL BASIS OF STATISTICAL INFERENCE
identical
procedure
repeated, but the drug
is
given as soon as the
is
initial
blood samples have been secured. Suppose the chemical analyses now reveal that on the first day there was no important difference between the initial
and
blood glucose
final
concentrations
whereas on the second day the
levels,
Can we
during the hour after drug administration.
all fell
then conclude that the drug was responsible for lowering the blood
glucose?
might seem that proper controls are now built into the experiment, but two major faults remain, which render any conclusion uncertain. First, the very act of drug administration may have influenced the observed It
outcome, even
if
what would happen inert pills
The question must be answered,
the drug itself were inert.
to the blood glucose levels
if
dummy
injections, or
were given, instead of the drug? The importance of
this
kind of
simulated treatment (known as placebo control) will be considered at
The second major fault is the assumption that the only first day and the second was that drug was given
length later.
difference between the
on the
own
latter
but not on the former.
deliberate intervention
is
To
attribute
common and
a
we may
nonetheless a serious one. Whatever treatment
may
influences about
which we know nothing
observed results
which we interpret as treatment
is
that today
is
nutritional states
at all
prime importance to our
understandable error, but administer, other
cause part or
effects.
all
The simple
of the truth
not yesterday. The subjects' physical, emotional, and
may have been
quite different
on the two days, the room
temperature and other environmental circumstances
and so on. Certainly
may have
differed,
blood glucose
different (and possibly relevant to
responses) are subjects' emotional reactions to the novel experience of the first
day
in contrast to the familiar
laboratory experiments, conditions
one of the second. Even
may change from one who may then
without the awareness of the investigator, to a treatment
what was
really
caused by an
in the simplest
time to another falsely attribute
unknown and extraneous
circumstance.
For reasons made evident above, a good experiment whenever possible
embodies the principle of concurrent control group in the
control,
same experiment with the
i.e.,
the inclusion of a
treated group or groups.
Sometimes there are good reasons why an experiment cannot include concurrent controls, but however good the reasons clusions of such experiments are uncertainty.
The commonest
comparison.
A
bound
to be clouded
illustration
certain drug, for example,
of this is
is
may
be, the con
by some degree of the "beforeafter"
generally believed to prolong
Some important aspects of experimental design
the survival of children suffering from leukemia.
drugtreated children of today survive
It is
5
certainly true that the
many months
longer than did
victims of this disease before introduction of the drug. During the
same
period of years, however, numerous other advances in medical care have occurred.
Would
today's patients do as poorly without the drug as did
have other influences contributed
their counterparts several years ago, or
to the apparent beneficial effects of today's
drug treatments? This question
can no longer be answered experimentally, because ethical to withhold a drug that
establish a concurrent control
human
is
would not now be
it
believed to be effective, in order to
group of
patients. This peculiar difficulty in
experimentation points to the importance of conducting thoroughly
conclusive experiments early in the
trial
period of any
new drug or
thera
peutic procedure.
Assuming
that an experiment will include a concurrent control,
shall subjects
be assigned to the treatment and control groups?
be supposed that any haphazard allocation of subjects would experience shows that this is
It
how
might
suffice,
but
so. Essential to
good experimental design
as nearly complete an equivalence of control
and treatment groups as
is
not
can be achieved. Otherwise differences arise
from the treatment, which may
among
As
outcome
will
be supposed to
really only reflect innate differences
the subjects themselves.
Consider a
an
in the
clinical trial to
determine whether a new drug
inert material (a placebo) in shortening the duration
patients with colds appear (for
physicianinvestigator at his
example
own whim
in
of
is
superior to
common
an industrial
colds.
clinic),
the
prescribes either the drug or the
placebo. Subconscious bias can play a surprisingly large role in deter
mining such assignments, so that the patients with milder symptoms may
more often and those with severe symptoms may receive more often. The outcome of such an experiment may be that
receive the drug
the placebo
the colds are over sooner in the drugtreated subjects, but the conclusion that the drug
was responsible does not merit confidence,
since the drug
treated
group might well have had shorter colds than the placebo group
even
the drug were worthless.
It
if
should not be thought that inadvertent selection
to groups occurs only in
human
in
assigning subjects
experimentation. So simple a matter as
dividing 50 mice into two equal groups for control and treatment can be
hazardous, because some characteristics of the mice
may
readily influence
their allocation by the investigator to one or the other group. For example, if
he removes 25 mice to another cage, they are very likely to be heavier and
.
THE LOGICAL BASIS OF STATISTICAL INFERENCE
6
sluggish (easier to catch) than the ones
more
left
behind.
Groups
selected
such a manner could not really be used for any experiment whatsoever.
in
Regardless of the outcome, one would have doubts about the equivalence the control
oi^
and treated groups, and so any conclusion about the upon a shaky foun
presence or absence of a treatment effect would rest dation.
The only thoroughly
way
reliable
randomize the assignments.
1
to set
up equivalent groups
Here the key requirement
is
is
to
that no character
istic of a subject whatsoever shall play any part in his assignment to a group. Tossing coins, rolling dice, or drawing lots are suitable procedures. The
most universally applicable random device is the table of random numbers. Such tables have been generated by a system (e.g., electronically) designed so that each of the ten digits has equal probability of appearing at
position in a sequence. Table
from a larger random of which
Example
An
will
ll.
an extract of 1,000 sequential
is
1
lends itself to a variety of uses, only one
series. It
be illustrated here.
Randomization.
experiment
is
among four different Make the assignments
to include 100 subjects, divided equally
treatment groups: placebo, drug A, drug B, and drug C.
randomly, by means of Table
The
first
subjects
any
digits
step
is
1
number
to assign a
would be numbered from
to each subject. In this case the
01 to 100 (denoted
by
00).
Then
enter
the table at any point and begin filling one of the groups according to the
sequences of numbers in the table. For example, left
to 47,
corner of Table
fill
1,
we
the placebo group
and so on, to
this
first,
we
the placebo group
we continue
we
it.
number
When
in the
start at the
... If we
assign subjects 94,
group. If a
appears again we simply ignore
if
94847 47234 476
find there
84,
upper
have decided
74,
72,
34,
that has already been used
25 subjects have been placed in
same way
to assign 25 subjects to
each of the next two groups. Then the remaining 25 are placed automatically in the final group.
It will
be observed that every subject has an
equal chance of being placed in any group, and also an equal chance of
being placed together with any other subject. randomizing is that it is possible on the basis of a model randomization to draw sound statistical inferences, the probability basis of the model being provided not by wishful thinking but by the actual process of randomization which is part of the experiment. ..." H. SchefTe, The Analysis of Variance (New York: John Wiley & Sons, Inc., 1959), p. 106.
^'The
logical reason for
reflecting the
Some important aspects
The
step (assigning a
first
there be an actual
list
number
7
of experimental design
to each subject) does not require that
of subjects. The sequential numbers might refer to
the order in which patients may, in the future, present themselves at a
would mean
hospital clinic. In that case the assignments described above
that in order of their arrival at the clinic, the 94th, 84th, 74th, 72nd (and
would be placed
so on) patients It is
good idea
a
maximum
use
to let
in a placebo group.
numbers represent
made of the numbers
is
12 subjects to be divided into groups. If
we would probably have
01 to 12,
the table before
all
to discard a great
cedure
to use as
is
subjects in such a
in the table.
Now
yield
we used only
we would have many numbers along the way. A more efficient promany cycles of 12 as we can. Eight cycles of 12 will fit
we
twodigit
divide
it
division by 12.
and
01
of Table
warning
order about an appealingly simple but incorrect
which the
digits in
Table
to three groups, the digits 7, 8,
9 for
3
the table.
Thus
if
the digits 31665 appear,
B.
Now
list
some of
as
to
we would
group A; the
the groups are
desired
number of
subjects
toward the end of the
bound
list
will
at the
beginning
is
filling
numbers
in the list is
out the list
"list"
is
filled
last
will
first
and
fifth
to
with the
vacant group.
be
same group
very substantially higher than
These defects may have serious practical consequences
where the
and
in
be evident that
therefore less than in a truly
cedure, and the probability that he will be in the
above him
be
will
6 for
have an unusually high probability
probability that a subject at the end of the
group as one
lo
it
5,
the subjects
assign the
third, fourth,
subjects sooner than others,
of being assigned together, particularly in
just
4,
Now
group C; and C would be discarded.
second subjects on the
The
in assigning 12 subjects
might stand for group A;
are assigned to groups according to the sequence of
list
group
2,
1,
way of
are permitted to stand for the
1
groups rather than for the subjects. For example,
on a
Thus, on the bottom
repeated 10 represented by the final 58 being discarded).
in
in
12.
would represent subjects
the sequence 24415 95858
1,
is
randomizing,
group B;
greater than 12,
is
by 12 and use only the remainder to designate a subject,
12, 5, 11, 10 (the
A
they
if
96 will obviously
therefore enter the table
number
allowing a remainder of zero to denote subject line
13 to
We
96. If a
be discarded
will
random numbers from
numbers between
all
the digit pairs from
those particular numbers turned up, and
random remainders on
and accept
that
to look through a considerable part of
between 01 and 96; the digitpairs 97, 98, 99, 00 appear.
way
Suppose there are only
in
in the
random
same pro
as the subject it
should be.
experiments
wholly or partly determined by some characteristics of
8
THE LOGICAL BASIS OF STATISTICAL INFERENCE
the subjects, as
when animals have
and then randomly
to be caught
For example, a group of the animals that were hardest to catch would very likely be assigned together to the group that happened distributed.
to be filled last.
To
avoid using the same sequence of
cedure
is
random
digits repeatedly, a pro
desirable for entering the table at a different place
employed.
Any method
that accomplishes this
is
suitable.
each time
it is
One can simply
mark each stopping place and begin there on the next occasion. Alternatively, one can number the columns and rows so that a randomly chosen
number can then
specify a point of entry.
Once the random assignment of subjects the remaining concern
is
that
to groups has been completed,
groups be subjected to identical con
all
ditions during the experiment, except for the treatments
Again the
pitfalls
A
many.
are
major one
is
under study.
failing to recognize,
or
minimizing the importance of some of the conditions associated with, but the treatment. Consider an experiment
not considered an intrinsic part
of,
to determine whether a certain
drug diminishes the
control mice remain untreated and undisturbed.
removed from
The
their cages several times daily
criterion of effect
is
the
number and
fertility
The
of mice. The
treated mice are
and injected with the drug.
size
of
litters
within a given
period of time. These might turn out to be substantially smaller in the treated group. Nevertheless, a conclusion that the drug reduces fertility
may
be quite
false,
because the controls were not subjected to the same
conditions as the treated mice. Repeated handling or the trauma of injection
may have played
If the
a major role in reducing fertility in the treated group.
treatment group received drug injections, the control group should
have had placebo injections on the same occasions. If a
treatment under study
subjected to a
including the
is
an operative procedure, controls must be
sham operation, as nearly like the real one as possible, and same anesthetics (which often produce significant effects
themselves). If a lesion
is
to be placed electrolytically in
an animal's
brain, an identical electrode should be similarly placed in a control animal,
but without passage of the electrolytic current. Ideally, to avoid inadvertent selection of animals, the electrode
would be inserted before
it
had been
decided whether the particular animal was to be treated or kept as a control,
and the decision whether or not to pass the
would then be made by tossing a
electrolytic current
coin. Precautions of this kind
may
Some important aspects
of experimental design
9
sometimes seem extreme, even absurd, but the careful experimenter keeps
them
A
in
mind and employs them whenever he can reasonably do
known
special technique
so.
as blind design ensures against the investi
gator's bias (conscious or unconscious) influencing the conduct of
experiment or the evaluation of the is
that the personnel
who
when treatment and
The
Sometimes
this
an
essence of this procedure
carry out the treatments should not
subjects belong to which groups. as
results.
is
know which
manifestly impossible,
control procedures differ grossly. However,
if
control animals are to receive inert injections while others receive drugs, it is
not very
difficult to
arrange for the actual injections to be coded, and
then administered by someone It is
who
"blind" with respect to the coding.
is
even more important when criteria of
effect are
being assessed, that
the investigator be able to measure, count, or otherwise evaluate results
with complete objectivity. Objectivity cannot be guaranteed
if
he knows to
which groups the subjects belong, because he usually has some emotional stake in the experiment, or he probably It
would not have undertaken
might be thought that insistence upon "blind" technique
is
it.
merely an
exaggerated fussiness about abstract principles of experimental design.
On the contrary, is full
the literature of experimental psychology and of medicine
of reports on experiments whose outcomes were determined more
by an investigator's bias than by any treatment under will suffice.
One example
Shortly after the introduction of the antihistamine drugs into
medicine, their effectiveness in the treatment of the investigated in several field
new
test.
trials.
The
results
common
cold was
were very favorable to the
drugs, with the consequence that antihistamines were widely
as cold cures. In these experiments, however,
promoted
no safeguards had been
employed to prevent the examining physicians (who rated the progress of each cold) from knowing which subject had received placebo and which
had
received
drug.
"Blind"
experiments,
undertaken
subsequently,
showed consistently and conclusively that the drugs were without
effect.
Later, in the elegant clinical trials that established the value of strepto
mycin
in the
treatment of pulmonary tuberculosis, the "blind" technique
was scrupulously observed. Even the
radiologists evaluating Xray films
were not allowed to know what treatments had been given, since independent studies had revealed a pronounced influence of such prior information
upon radiological
interpretations.
although measurements surprising
how
may
be
Even
made
in
laboratory experimentation,
with very accurate instruments,
it is
often accidental errors can occur in the direction of an
investigator's bias!
THE LOGICAL
10
BASIS
OF STATISTICAL INFERENCE
compared with animal) experimentation the difficulties are compounded, because the subject's knowledge about the experiment
human
In
and
his
(as
preconceptions about the anticipated effects can also influence the
outcome. Neither the subjects nor the investigator directly involved in the experiment must know how the treatments have been assigned. Medications are given serial code numbers, and the person with access to the
code refrains from any contact with the actual experiment until all the data have been collected. Nevertheless, even this "doubleblind" system is not foolproof.
A
drug
may
reveal itself through a side effect such as
drowsiness or dry mouth, which cannot be duplicated in the placebo. If the subject becomes convinced (rightly or wrongly) that he has received a
potent drug, or that he has been given a placebo, his responses conviction to a remarkable degree.
reflect this
interpretations
become extremely
may
Under such circumstances,
difficult.
Suppose now that subjects have been assigned randomly to a treated group and to a concurrent control group, and that proper blind precautions have been taken.
mental
design will
The
validity of the
outcome
in such
an experi
then depend very largely upon the practical equivalence
of the two groups. Even though the subjects were assigned randomly, the
groups might
still
differ accidentally in
important ways, especially
if
there
A difference in outcome might then only some chance difference between subjects in the two groups, whereas it would be falsely interpreted as a treatment effect. This difficulty can be overcome by using a balanced design. We would perform the experiment in two parts, first using one group as the concurrent control, then reversing the roles of the two groups in what is known as a crossover. The are but few subjects per group. reflect
effect
of any intrinsic differences between the groups would thereby be
minimized.
A crossover experiment for testing the effect of a drug on blood glucose in
human
subjects might be conducted as follows. Subjects reporting for
the experiment
would be assigned
random number
table.
Group
A
to
group
would
A
or group
B by means of the
receive placebo
drug on the second; group B would receive drug on the
on the first
first
day,
day, placebo
on the second. Under these conditions, provided the proper blind
pre
cautions were taken, a lowering of blood glucose which occurred in both
groups when drug was given and in neither group when placebo was administered could be taken seriously as evidence of a treatment
effect.
more than two groups are involved, the crossover principle becomes more elaborate. The Latin square may be used to ensure that each group If
Some important aspects of experimental design
I I
receives every treatment in a systematic fashion. Latin squares were intro
duced originally for the purpose of subdividing plots of land for agricultural experiments, so that treatments could be tested even
parts of a field
had various
soil
alphabet are placed in such a
once
in every
row and
though
different
conditions. In a Latin square, letters of the
way
in every
that each letter appears once
column. Suppose we wish to
different treatments (including a control)
on the spontaneous
and only four
test
activity of
We assign the desired number of mice randomly and D. We then choose a 4 x 4 Latin square, such
mice, in a balanced design. to each
group A, B, C,
as the
one below, and assign meanings to the columns and rows, as
indicated.
2
Day
Treatment
I
Day 2
Day 3
1
A
B
c
D
2
B
C
A
3
C
4
D
D A
D A B
C
The square provides the
B
specifications for carrying out the desired
experiment on four successive days. Each group of mice treatment once, and
how
Day 4
easy
we could
it is
all
groups
will
be treated each day.
shall see later
to analyze the data obtained in balanced experiments.
Here
ascertain not only if the treatments differ from each other
from the control, regardless of the day of treatment, but also taneous activity of the mice
and
each
will receive
We
finally if
differs
if
and
the spon
from day to day regardless of treatment,
treatment effects depend in any
way upon
the day of treat
ment.
The Latin square or equivalent balancing procedures may be during the course of an experiment as well as in principle of maintaining similar conditions for
its initial
all
useful
planning.
The
groups throughout an
experiment requires that even unforeseen influence^ must be prevented
from acting preferentially upon any particular group. Every investigator can
some unfortunate experience that taught him this particular The following example is fairly typical. An experiment to ascertain
recall
lesson.
whether a particular extract had antibacterial activity required the incubation of agar plates containing bacteria and extract (treatment group)
and of similar plates from which the extract was omitted (control group). 2 There are many different Latin squares of each size. One of these may be selected randomly from the collection catalogued in R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical Research, 4th ed. (New York Hafner Publishing Company Inc., 1953), Table XV and pp. 18ff. :
:
THE LOGICAL
12
The
BASIS
OF STATISTICAL INFERENCE
grow on treatment followup experiment employed
bacteria failed to
plates.
A
estimate
plates
and grew well on control
serial dilutions
of the extract to
antibacterial potency. Again, although control plates grew
its
normally, no growth occurred on treatment plates, even after millionfold
however, that no bacteria grew on "treat
dilution. It finally developed,
ment"
plates,
even when no extract was added.
right to left in a defective incubator
from
plates
had been regularly placed
A
temperature gradient
was wholly responsible. Control
at the left,
where the thermometer was
and where the temperature was correct. Treatment plates had always been placed to the right, where the temperature was too high to support any bacterial growth. A systematic alternation in the placement of located,
control and treatment plates would have revealed the true situation at once.
The placement of animal cages
in a
room might
also
seem unworthy of
special planning. Yet light, temperature, noise, vibration are
many
among
the
conditions that can vary from place to place in an animal room.
Systematic placement of cages according to a Latin square plan helps equalize extraneous influences.
For the same reasons the order
in
which
procedures are carried out with various experimental groups should be
on
varied systematically, in order to balance out any possible influence the outcome.
There are several variations on the balancing principle, some of which are used very frequently.
One such
design
is
the randomized block. Often
one wishes to provide as broad a basis as possible for generalization of an experimental conclusion. For example,
it
may
be worthwhile to examine
treatment effects in several strains of animal rather than limiting the
experiment to a single
strain. In that case
randomly mixing animals of a distinct group.
The
latter is
by
may be
fetal
strain as
on
all
the animals taken collectively,
gained about strainspecific differences in the
number of chemicals for their malformations when administered to pregnant
Suppose one wished to produce
and keeping each
far the better procedure, for without
losing any information about effects
information
one has to choose between
different strains,
test a
effects.
ability to rats.
The
entire experiment will be replicated in several "blocks," each block consisting
of animals belonging to a single
chemicals are
made randomly
(W, X, Y, Z) and
five
each
litter,
The assignments of
the
within each block. If there were four strains
chemicals (A, B, C, D, E) under
each chemical would be The data would be observations on in all, then
strain.
test,
and 60 animals
tested in three animals of every strain.
the
number of malformed young
and they could be tabulated on the following grid
in
3
:
Some important aspects of experimental design
1
Chemicals
A
Strains
D
C
B
E
Total
W X Y Z i
Total
Each box of the table
will
contain three observations, there will be 12
observations on each chemical, and 15 observations on each strain. Such a design
differences reality
because
is efficient
experiment, and
it
it
answers a number of questions
in a single
tends to improve the reliability of the results because
between
strains
can be taken into account in assessing the
of any differences between chemicals.
In a factorial design combinations of treatments are examined. Several levels
of one factor are crisscrossed with several levels of another. For
example, an anticonvulsant drug (factor A) might be studied at four different doses (including a zerodose control) in animals
vulse
may
by three
different procedures (factor B).
then be divided equally and randomly
The
among
made
to con
available animals
the twelve combi
nations of the factors, as in the following grid Factor
A
Doses of Anticonvulsant Drug Factor
B
Convulsant Procedure
1
2
3
Total
a
b
c
Total
:
THE LOGICAL BASIS OF STATISTICAL INFERENCE
14
Here each box
will
contain one or
that each level of factor
and vice
factor £,
A
is
more animals, and the design ensures
tested in
combination with every
may
versa. Factorial designs
level
of
permit one to assess in a
single experiment not only the primary effects of the levels of each factor
independently, but also the joint effects of the combinations.
A nested (or hierarchical) design is
one
which the factors do not
in
cross, but rather are present in various tiers,
one.
An example would
criss
each contained within a higher
be a comparison of the accuracy and precision of
bloodcell counting in several clinical laboratories.
Suppose portions of
same blood were coded appropriately and then submitted to three laboratories, each employing a number of technicians. It might be of the
interest to
know
laboratories
technicians in the
what extent the counts obtained
to
agree,
in the different
whether or not the results obtained by different
same laboratory
differ in
any systematic way, and also
whether or not each single technician obtains a reproducible count in replicate trials with the
same blood. Here the lowest
tier
contains "repli
cates within technicians," the next contains "technicians within labora
and the highest contains the three laboratories. The data might be
tories,"
five replicate
counts by each technician, and they would
fall
into the
following hierarchical tabulation
A
/Laboratories
!
Tiers ^
Technicians
a
I
b
d
c
I
e
f
The methods of analyzing data design will be described in Chap.
in these 2.
==
and other types of experimental
The designs described here
are merely
most commonly encountered, but much more elaborate
ones are also used. Detailed information about this subject in the references
h
g
ill = = = 11 = 111 = = ==
Replicates
the simplest and
C
B
may
be found
beginning at page 192.
Whether or not the reader ever has
to set
up an experiment embodying upon to read
the design principles outlined here, he will certainly be called
and interpret published reports of investigations
methods were used.
It
in
which
cannot be emphasized too strongly that
statistical
statistical
procedures yield valid conclusions only for adequate experiments. the
first
satisfy
steps in evaluating a report of an experiment
is
One of
therefore to
oneself that the principles of preliminary randomization
and
Some important aspects of experimental design
subsequent control were followed. Only then
whether or not treatment had any
15
appropriate to inquire
is it
effect.
SAMPLING DISTRIBUTIONS Experimental observations variable, but variability
Any
tation.
is
single experiment
special
prominent
is
experimen
in biological
necessarily of finite scope, yielding a
same experiment were repeated,
limited sample of data. If the different set of data
of science are to some extent
in all fields
especially
a
somewhat
would generally be obtained, so we cannot attach any
importance to a particular sample. Rather do we regard each
sample of data as having been drawn randomly from an collection of similar data that
happen not
infinitely large
to have been included in the
sample. This hypothetical infinitude of data, of which the sample representative,
we
call
Data that characterize a sample are known as statistic is
the
mean 3
DNA
seaurchin eggs. Another statistic
we
An example
statistics.
of a
content of 100 randomly chosen unfertilized is
number of mice
the
which are paralyzed by a given dose of drug. case that
is
a population.
Now
in a
group of 25
almost always the
it is
are interested in populations, not in samples. This
is
merely
another way of saying that experimental observations are useful to the extent they have general relevance.
What we wish
to
know
is
the
DNA
content of unfertilized seaurchin eggs in general, not the content of the
we
are interested in
to mice in general, not to a particular
group of 25 mice.
eggs chosen for a particular experiment. Likewise
what a drug does
Numbers
that characterize a population are
Parameters corresponding to the sample
mean all
DNA
content of
all
known
as parameters.
statistics just cited
unfertilized seaurchin eggs,
would be the
and the percent of
mice that the given dose of drug would paralyze. The only way we can
obtain information about populations
Sample
statistics
are
then used
population parameters. That
is
why
as
is
to
make
observations on samples.
estimators
of the corresponding
the procedures of randomization and
control discussed earlier are so important; they ensure that the samples
we
deal with will be truly representative of the populations from which
they were
drawn and therefore
that the statistics
estimators of the parameters in which 3
A mean
be defined
is
we
we obtain
will
be
fair
are really interested.
an ordinary arithmetical average. There are other kinds of averages, to
later.
THE LOGICAL BASIS OF STATISTICAL INFERENCE
16
Suppose that the true mean weight of all students of a given age is 165 lb. We shall weigh randomly chosen groups of students, compute the mean weight in each group, and see how well these sample means estimate the true mean. Let us begin with the smallest possible group, a sample of
one (N
=
namely the individual student. Each such weight
1),
unbiased estimate of the parameter, 165
lb,
because
it is
is
an
just as likely to be
high as low, and the longterm average of the estimates will approach the value of the parameter in question. But obviously the weight of an individual student
is
not
likely,
except occasionally, to be 165 or even very
close to 165.
Now
consider groups of ten students (TV
each sample
=
again an unbiased estimate of the
is
The mean weight in population mean because
10).
in the
long run the fluctuations above and below 165 will balance each
other.
Each sample of 10
below
165, so the
is
almost certain to contain weights above and
mean of such
was the weight of a randomly
a sample
is
likely to
be closer to 165 than
selected individual student. It should there
fore be evident that a statistic estimates the corresponding parameter ever
more
accurately as the sample size increases.
approached the
become
size
of the population (N *
At the extreme,
oo), the
sample
if
a sample
statistic
would
indistinguishable from the population parameter.
Besides sample
determines
how
the
size,
amount of
variability in a population also
accurately a statistic will estimate the corresponding
parameter. Clearly,
if
no student's weight
than a pound or two, then even
statistics
differed
from 165
be pretty good estimators. For any specified sample will better estimate the
lb
by more
from very small samples would
parameter, the smaller
is
size,
the statistic
the variability in the
population from which the sample was drawn.
These generalizations about are not precise
how
enough
to be useful. In each instance
well a statistic estimates
example, that we want to
young adults venient these
after
number of
and parameters are
statistics
its
know
to
know just
corresponding parameter. Suppose, for
the
an overnight
true, but they
we need
mean blood
fast.
We
glucose concentration in
could randomly select a con
subjects (let us say 20),
draw blood, and determine
blood glucose concentrations. The mean concentration in the
sample
will
then be an unbiased estimate of the true blood glucose con
centration of
all
similar people under the
of the sample data will also
tell
same conditions. The
us something about
glucose levels are in the population. If
it
how
variability
variable blood
should happen that
all
the sample
values are in the range 0.901.10 mg/ml, then probably most of the
Some important
population values are also in this range. population
mean might
it
could be
intuitively. If the
1.37.
population
17
can imagine then that the
well be something like 0.93 or 1.02, but
be very reluctant to believe
can be grasped
We
aspects of experimental design
we would
The general line of reasoning mean were 1.37, there would
have to be values above as well as below this. Under such circumstances it would be very very improbable that we should draw a random sample of 20 subjects, all of whose blood glucose concentrations were lower than
we did draw such population mean is as high
we reject the hypothesis that the as 1.37. If we consider, one by one, a series of such hypotheses about the population mean that it is 1.36, 1.35, 1.34, and so on we will eventually come into a region of 1.10.
Since the truth
is
that
a sample,
—
—
acceptable hypotheses. For example, the hypothetical value 1.09 would
probably be regarded as acceptable, as would other values covered by the sample data. At a region of rejected hypotheses. ties
still
in the
range
lower values we would again enter
The process of
inferring
from the proper
of a sample, within what range a parameter probably
lies is
known
as
estimating a confidence interval (or confidence limits) for that parameter.
Assume now concentration
is
that in a
we somehow know what the mean blood glucose control population, and we conduct an experiment to
ascertain whether or not a drug lowers this concentration.
We
shall
obtain data from a treated group of subjects and compute the sample
mean. Using these sample data we can then estimate a confidence interval for the
mean of
the treated population.
Now
if this
senting the range of probable values of the true lies
below the known
entire interval, repre
mean
after treatment,
control mean, then we can conclude that the drug
has a significant effect in reducing the blood glucose concentration. the other hand,
if
the confidence interval for the
mean of
On
the treated
population includes the control mean, we would be unable to conclude that the drug
had any
effect, since
the sample data are not inconsistent
with the hypothesis that the treated sample was drrwn from the control population, and that the observed sample
of the
known
We
is
a reasonable estimate
control mean.
The foregoing discussion assumed a meter.
mean
rarely
knowledge about a para
have such knowledge. Often we have two experimental
groups, control and treated, and
ment has had an
priori
effect.
This
is
we want
to
know whether
or not treat
tantamount to asking whether or not both
samples could reasonably have been drawn from the same (untreated) population.
even
if
Of
course,
we expect
the sample
means
to differ
both samples did come from the same population. The
somewhat
real
question
THE LOGICAL
18
is
BASIS
OF STATISTICAL INFERENCE
whether the difference between the two sample means
is
so large that
we
compelled to reject the hypothesis that they are estimates of the same
feel
parameter.
Here the required confidence
interval
for
is
a
difference
between two parameters. Suppose the mean blood glucose concentration in the control sample is 1.00 and in the treated sample 0.89. There is an apparent difference of —0.1
The
ence. (for
1,
which
our best estimate of the true
is
example) 0.26 to +0.04. Since a zero difference
confidence limits,
we cannot be
we cannot
assert that the
was any
certain there
is
The
effect.
—
effect,
because
between control and
treated blood glucose concentrations. Yet neither can
drug might have had some
included in these
drug had any
real difference
differ
might be
confidence interval for the true difference, however,
we deny
that the
confidence interval gives the probable
was any lowering of blood
limits of
magnitude of the
sugar,
was probably no greater than 0.26 mg/ml, and there might even
it
have been a small
rise
effect
i.e., if
of blood sugar, no greater than 0.04 mg/ml, which
the sample accidentally failed to detect. fidence interval
there
had not included
On
the other hand,
we could have
zero,
if
the con
asserted the efficacy
of the drug and stated the probable quantitative limits of
its
effectiveness.
In discussing the meaning of confidence limits and the rationale for
deciding whether or not observed effects are to be taken seriously,
made
use of rather vague words like "probably" and "reasonably."
said, for it
example, that
was not "probable"
mean. To get
if
mean lay outside we would have observed a
a population
that
at a precise definition
of such terms
we
We
a certain interval, particular sample
we need
information about the probability of drawing various sample
concrete
statistics
by
chance from populations with specified parameters.
The
probability of an event
that event, relative to
fraction
all
is
the longterm frequency of occurrence of
alternative events.
It is
expressed as a decimal
between zero (the event never occurs) and unity (the event
always occurs and no alternative event ever occurs). Sometimes a probability is
known
a priori, as in pennytossing, where
long run heads and is
tails will
each
fall
we know
that in the
with relative frequency 0.5, and this
therefore the probability that any particular toss will produce heads.
Sometimes a probability can only be estimated empirically by observing the relative frequency
A all
toward which the
sampling distribution
is
results of a great
many
trials
converge.
a graph showing the probabilities of obtaining
possible statistics in samples
drawn randomly from a
specified
popu
a histogram depicting the expected sampling distribu
lation. Figure 11
is
tion for throws of
two
dice,
computed a priori on the assumption
that each
Some important aspects of experimental design
face has equal opportunity to be uppermost.
outcomes are given on the horizontal
The
19
eleven discrete possible
axis, the probability
of obtaining
shown on the vertical axis. Figure 12 shows an expected sampling distribution of mean weights in samples of 10 students from a hypothetical population whose true mean is 165 lb. In order to plot this sampling each
is
Sampling distribution for throws of two dice.
Figure ll
6
5
*
7
8
Sum Of The Numbers On
distribution an assumption also
9
10
Both Dice
had to be made about the
the population. In contrast to Fig.
11,
this
variability in
sampling distribution
is
a
continuous curve, since here sample means are not limited to discrete integer values but
may assume any
intermediate values as well. In the
next chapter the origins of such sampling distributions as are depicted in these figures will be considered
more
closely.
observe that a sampling distribution shows
drawn sample meter
it
statistic will
estimates. Thus,
if
how
For the present we may likely
it is
that a
randomly
deviate to any given extent from the para
we have
the sampling distribution will
a particular sample statistic in hand,
tell us the exact probability of its having been obtained randomly from a hypothetical population with a given
THE LOGICAL BASIS OF STATISTICAL INFERENCE
20
parameter. That probability, as
judgment
we
shall see,
then becomes a basis for
sample was actually drawn from the
as to whether or not the
hypothetical population. Figure 12
Sampling distribution for mean weights in samples of 10 from a hypothetical population with mean 1651b.
150
165
160
155
Mean Weight
Example
From The
the sampling distribution less
total area
desired area
Example
10
left is
13.
shown
in Fig. 11, estimate the probability
of
with two dice.
under the histogram represents the sum of the probabilities for
possible outcomes,
area to the
Sample Of
Probability of an outcome.
12.
throwing 4 or all
In
175
170
of
5.
i.e.,
unity.
The
probability desired here
is
the fractional
Since the bars of the histogram are of equal width, the
proportional to the heights, 0.03
+ 0.06 + 0.08
=0.17.
Probability of obtaining an extreme statistic.
Estimate from Fig. 12 the probability of drawing, from the hypothetical population, a
random sample of
10 with
mean weight
outside the limits 160170
lb.
Here
can be seen that about twothirds of the area lies within the stated about onethird outside. So the required probability is about 0.34. Since the curve is symmetrical, there is probability 0.17 of obtaining a sample it
interval,
mean
less than 160, and 0.17 for obtaining one greater than 170. If we actually weighed many such samples, about twothirds of the means would, in the long run, be between 160 and 170. It would certainly occasion no surprise, however, if
as
a sample were selected randomly and 1
7 of every
1
00 sample means
will
its
mean turned out
exceed
1
70.
to be 171,
inasmuch
Statistical hypotheses
Example
14.
and decision
21
rules
Deviation corresponding to a given probability.
You would
certainly be surprised
then found that
its
mean
if
you chose a
95 out of 100 similarly selected sample
high would the sample
mean have
statistics.
random sample and mean by more than did
single
deviated from the population
how low
Approximately
or
to be in order to occasion such surprise,
assuming the sampling distribution of Fig. 12? Here we wish to know the interval of mean weights that includes 95 % of the total area under the curve. The correct answer is 155175, so a sample mean outside this range would indeed surprise us. The answer could have been found by actually measuring the area but it is more readily obtained from a special table of areas that will be described in the next chapter.
STATISTICAL HYPOTHESES
AND DECISION RULES
Imagine that you are shown a penny, heads up, and are asked to decide, without examining
it,
You What is
way
whether
headed one.
are permitted to see the
you
the best
like.
penny
tails
will fall tails
it
tossing
it
an ordinary penny or a two
it is
outcome of
to proceed? Naturally,
sooner or
later,
as
many
so the simple answer
as long as possible. If tails appears, the
problem
tosses as
an ordinary
if it is
is
is
to keep
solved. If
does not appear, however, you will have to decide sooner or later
that the
penny
is
twoheaded.
No
you may be wrong; the very next
matter when you
toss
make
this decision,
might conceivably have
But obviously, the longer you wait the more certain you
will
fallen tails.
be of making
the right decision.
The problem would be more
interesting,
as an analogy to experimental situations, to each toss,
action
if
and somewhat more there were
some
realistic
cost attached
and some penalty for a wrong decision. Your course of
would then
certainly be determined
the cost of continued tossing
by these new contingencies.
was low, and the penalty was high
If
for
wrongly concluding the penny was twoheaded, you would wait to see a If, on the other wrong decision a
long succession of heads before arriving at any conclusion.
hand, the cost of tossing was high and the penalty for
was low, you would terminate the
series early,
perhaps after a very few
tosses.
In this case, since a priori probabilities for the behavior of a true
are
known,
it is
easy for us to arrive at decision rules.
formulate a hypothesis which
is
The
to be accepted or rejected
sample data. Here the hypothesis to be tested
is
that the
first
penny
step
is
to
on the basis of
penny has two
THE LOGICAL
22
OF STATISTICAL INFERENCE
BASIS
heads and
different faces,
headed. The next step
tails;
the alternative
to decide
is
how
probability will
is
known
be rejected
;
—
two
we
are
what probability we are the penny twoheaded when it is not. This
designated by the letter P.
it is
is
i.e.,
which the hypothesis
as the level of significance at 4
As we have noted
already,
choice of the level of significance.
will influence the
both cost and penalty
penny
that the
often, in the long run,
willing to reject the hypothesis wrongly willing to accept for calling
is
Suppose we are willing to be wrong as often as five times in every hundred 5 Then we can reject the hypothesis if trials, but no more often (P ^ 0.05). and only
if
outcome
a sample
is
pected with probability 0.05 or
observed, which would have been ex
less,
were the hypothesis
Let us consider the probability that the specified toss of a true penny. first tails 1
to
appear
/2 ) and then
tails at 2)
first tails 3
(Vi)
appearing
C/ifi C/2)
*
2nd
Obviously toss,
this
heads
is
is
/2 for the
appear
1st toss.
required at the l
the 2nd (probability
1
at
any
For the
1st (probability
/2 ), giving a combined proba
for both required events. Similarly the probabilities of the
C/2 )/C/
bility
at the
in fact true.
first tails will
and higher
at the 3rd, 4th, 5th,
5 ^
etc

tosses are
found to be
These probabilities are plotted as a histogram
in
Fig. 13.
Now 5th toss
on the
the probability that the is
given by unity less the
1st,
sum of the
of
all
sum of the
not appear sooner than the
probabilities for
2nd, 3rd, and 4th tosses; or (which probabilities for
tosses; or (which
histograms
first tails will
at,
is
also the
and to the
same

[(V2)
+
which exceeds our desired
the
same
5th, 6th,
appearance
thing) by the
and
all
higher
thing) by the area contained in the
right of, the 5th toss, relative to the total area
the histograms. This probability
1
is
appearance on the
its
its
2
is
3
readily found to be
+
4
=
C/2)
+
level
of significance. The shaded area, on the
(V2)
(V2)
]
0.0625
other hand, including the 6th and higher tosses, represents a total probability
0.03125, well below the desired level. Thus, since the probability of
getting four heads in a
row and then
tails
on the 5th
toss exceeds 0.05,
we
would not consider that outcome incompatible with the hypothesis of a true penny; whereas getting five heads in a row would happen rarely
enough with a true penny
to
make
us reject the hypothesis.
The
decision
4
Rejecting the hypothesis when it is true is referred to as a Type I error. The proba(designated by the Greek letter a) of committing a Type I error is obviously the same as the level of significance (P). The two symbols will be used interchangeably. bility
5
means "greater than";
^ means
"equal to or
less
than."
Statistical hypotheses
must therefore be: Accept the hypothesis
rule
reject
if
it
no
appears within
tail
five tosses.
if
any
and decision
tail
rules
23
appears, but
For twoheaded pennies
this
rule will be free of error, but the honesty of ordinary pennies will be
impugned wrongly about
3
times in every 100
Figure 13
Expected
trials.
outcomes of pennytossing with a
true penny.
o CL Q.
*
*
2
68
QUANTITATIVE DATA
For the withinsamples SS: 2 Z (x m  x m = Z x m  2 X x m x m + X N 2
)
=
X
2
2 Z X m ~ 2 £ *m*m + Z ***i
= Yx
2
—
T x
m 2
Z*In practice, the withinsamples SS
2
I%
is
m
usually not calculated, but taken as
the difference between the total SS and the betweensamples SS.
how,
algebraically, the
components of variation add up
Note
to the total SS.
Betweensamples
^N
m
N
Withinsamples
Tm 2
^ltN 1
m
2 Y ^x
Total
It
r2
N
should be clear from the foregoing that the only terms required for an
analysis of variance are
x
2
T2
N
Means do not have to be calculated; they are implicit in the procedures. In more complicated analyses, when samples are grouped according to more than a single criterion, one has to calculate a different betweensamples SS for each grouping of samples, but otherwise the procedure
is
the same.
Example 214 may now be worked by the direct method. and systematic procedure is first to prepare a preliminary squares.
16
16 This is
A
convenient
table of total
Here we have, the stepbystep
method suggested by J. C. R. Li. As one becomes more some of the preliminary steps can be omitted.
familiar with the calculations,
:
69
Analysis of variance
Grand
T 2 = (  20) 2 =
total
Samples
£
Tm 2 = (  22) 2 +
400 2
=
(2)
488
Observations
£x = (7) 2
2
+ (4) 2 +
•••
+(_i)2 =
i4 8
PRELIMINARY CALCULATIONS Number of
Number of
Total of
Type of
Total of
Items
Observations per
Total
Squares
Squared
Squared Item
Squares per Observation
10
40.0
Grand
400
Samples
488
2
5
97.6
Observations
148
10
1
148.0
From
the preliminary table,
we then compose
(9 2 (I x )
the SS, as explained
ANALYSIS OF VARIANCE
Sexes (samples)
Error
Total
The
Ftest in a
DF
SS
Source
Variance Estimate
97.640.0=
57.6
1
57.6
148
97.6=
50.4
8
6.3
148
40.0=108.0
twosample comparison
F=9.14
is
really identical to a /test.
with (TV 1) 1, (N — 1) DF is numerically identical to t DF. Thus, if Example 214 is worked as a Mest, one finds t = 3.02, which is the square root of 9.14. Of course, examples of this type can also be worked by the nonparametric twosample rank test. The usefulness of the analysis of variance becomes evident in more complex experimental
Indeed,
2
F with
designs, as illustrated
by the following examples.
QUANTITATIVE DATA
70
Example 215.
Anal/sis of Variance:
Oneway
classification,
manysample comparison.
The progress of woundhealing was compared when five different postoperative regimens were employed after abdominal surgery. The 30 patients in the study
were randomly assigned. The numbers below are coded data on duration of the woundhealing period. Postoperative Regimen
A
Tm (Tmf
= =
D
C
B
E
3
4
2
6
8
5
7
3
3
2
5
6
4
5
4
2
6
3
5
5
4
9
3
2
6
5
7
5
4
6
24
39
576
1,521
20
25
31
400
625
961
S x* = 739 r=i39 T2 =
19,321
PRELIMINARY CALCULATIONS Number of Type of
Total of
Items
Total
Squares
Squared
Number of
Total of
Observations per
Squared Item
Squares per Observation
Grand
9,321
1
30
644.0
Regimens
4,083
5
6
680.5
739
30
1
739
Observations
ANALYSIS OF VARIANCE
Regimens
680.5

644
Error
739

680.5
739
 644
Total
F= 9.13/2.34 = 3.90 p = 0.05, variance
so
(4,
we conclude
itself,
25
Variance Estimate (Mean Square)
DF
SS
Source
= 36.5 = 58.5
4
9.13
25
2.34
=95
29
DF) and
this
exceeds the
that the regimens
do indeed
critical differ.
value 2.76 for
The
analysis of
however, does not permit us to decide which pairs of regimens
differ significantly
from each other and which do
not.
71
Analysis of variance
For making simultaneous comparisons between several
we wished
as
to
do
in
Example
different
215, the student ized range test
is
means, used.
17
number of samples being compared and the (i.e., the number of samples times one number of items per sample) we obtain a preliminary factor
Consulting Table
8,
for the
withinsamples degrees of freedom
than the
less
which
k*,
minimum between means may
then multiplied by a standard error term to obtain a
is
which the actual ranges
significant range (A), against
be compared. The procedure
is
very similar to that for obtaining a confi
dence interval (p. 45).
where k*
taken from Table
is
analysis of variance,
Nm
is
samples should be of equal
Example 216.
Ve
8,
the
the errorvariance estimate from the
is
number of observations per sample.
All
size.
Studentized range test.
Apply the test to the data of Example 215 to ascertain which regimens differ from each other at the 5 % significance level. Here we consult Table 8 with 5 means and 25 DF [i.e., (N m — 1)DF per sample] and interpolate k*=4.\6. From Example 215, fe = 2.34. Then, since each sample contains 6 observations, f
the smallest significant range at the
Now we
It is
find each
mean
as
5%
level of significance.
arrange them
in
order of magnitude:
C
A
D
E
B
3.33
4.00
4.17
5.16
6.50
then apparent that of
differ
T m /6 and
2 34
all
the contrasts between pairs of means, only
by an amount as great as
superior to B, but
we cannot
k.
We may
assert that
B and C
therefore conclude that regimen
it
is
C
is
superior to A, D, or E, or that
there are any other real differences. 17
"Studentized" refers to the tabulation of a statistical distribution based on a variance v, derived from a sample, when a is unknown. Such procedures were introduced by W. S. Gosset, under the pseudonym "Student." The /test, which is also based on a smallsample variance estimate, is often called "Student's /test." Although we shall illustrate the studentized range test in Example 216 upon the same data as was analyzed by the Ftest in Example 215, it should be understood that the estimate,
two
tests are alternatives,
not ordinarily applied sequentially to the same data.
:
QUANTITATIVE DATA
72
In this particular example the studentized range test allowed us to
conclude that the extreme samples differed significantly.
It
may happen
that even though analysis of variance permits the conclusion that a set of
means did not come from
the
same population, the studentized range
nevertheless does not reveal any particular pair of to differ. This test
may
is
not be
efficient
enough
tude. In the following example, test
means
no more surprising than any case
makes possible
in
test
that can be said
which a particular
to detect a real difference of given
magni
on the other hand, the studentized range
several discriminations
way
Example 217. Analysis of variance: One variance and studentized range test.
among
the sample means.
classification;
manysample analysis of
Four different media were compared to see if they differed in supporting the growth of mouse fibroblast cells in tissue culture. Five bottles were used with each medium, the same number of cells were implanted into all 20 bottles, and the total cell protein in each bottle was determined after 7 days. The results were as follows (/Ag of protein nitrogen)
Medium
C
D
E
100
101
107
100
119
100
104
103
96
122
99
98
105
99
114
101
105
105
100
120
100
102
106
99
121
A
The data
are
first
B
coded by subtracting 100:
A
B
c
1
7
~N m
E 19
4
3
4
1
2
5
1
1
5
5
2
6
1
21
10
26
6
96
__Trn X
D
2.0
5.2
100
676
22 14
20
1.2 36
19.2
9,216
T=126 72
=
15,876
73
Analysis of variance
PRELIMINARY CALCULATIONS Number of
Number of
Type of
Total of
Items
Total
Squares
Squared
Observations per Squared Item
Grand Media
Total of Squares per Observation
15,876
1
25
635.0
10,028
5
5
2,005.6
2,096
25
1
2,096
Observations
ANALYSIS OF VARIANCE Variance Estimate {Mean Square)
DF
SS
Source
Media
2,005.6 
Error
2,096
  2,005.6
=
90.4
20
4.52
2,096

=
1,461.0
24
—
Total
635.0=1,370.6
F= 342.6/4.52 = 75.8 4.43 at
P = 0.01,
= 5.29.
(4,
20 DF), which greatly exceeds the tabulated value
From Table
test:
D
1
A
1.2 see that
medium E is superior by more than k. C
closest neighbor
No
P = 0.01,
^452
= 5.29
the smallest significant range at the
to B.
8, at
for 5 samples
and 20 DF,
Then A:
We
342.6
so the media differ very significantly.
Studentized range
k*
635.0
4
%
= 5.03
level
of significance.
B
C
2.0
5.2
E 19.2
to all the others, since is
superior to
A
and
D
it
exceeds even
its
but not necessarily
other comparisons are significant.
Example 218.
Analysis of Variance:
Randomized block design, 4 treatments
>
cations.
Each observation
is
the weight of a
mouse
at 3
months
Diet Litter 1
A
B
C
D
20
18
18
21
23
2
19
17
20
3
20
20
17
20
4
22
21
16
23
5
19
19
16
22
after weaning.
5 repli
QUANTITATIVE DATA
74
Code by
subtracting 20:
Diet
D
A
Litter
1
1
+2
5
1
7diets
T
diets
9
1
1
3
9
+3 +2
+2 4
16
9
3
4
litters
3
+ +3
1
2
T
^litters
5
13
+9
25
169
81
4
=T r2 =
8i
PRELIMINARY CALCULATIONS Number of
Number of
Total of
Type of
Total of
Items
Total
Squares
Squared
Observations per Squared Item
Squares per Observation
Grand
20
81.0
275
Diets Replications
4.05
55.0
5
4
39.0
9.75
(litters)
Observations
20
89.0
89.0
1
ANALYSIS OF VARIANCE
MS
DF
SS
Source
= 50.95 = 5.70 55.09.75 + 4.05 = 28.30
F 7.20**
Treatments
(diets)
55.0
4.05
3
Replication
(litters)
9.75
4.05
4
1.42
12
2.36
Error
89.0
Total
The
89.0
error SS
is
 4.05 = do not
differ
unity), but diets do, since 7.20 (3, 12
% significance level. 18
Diet
from one another
DF) exceeds
12
DF at
/>
=
N.S.
all
the others from
(since
F is
less
than
the tabulated value 5.95 at the
D seems to be best and diet C poorest. The student
more information about 0.01, k* = 5.50. Then
ized range test gives us
and
42
107
73
72
145
the table are identified by letters for easier reference in the fol
lowing calculations. Here we have four categories: placebodelay, caffeinedelay, placebonodelay, caffeinenodelay.
It
is
readily ascertained that since
marginal totals are fixed, once data are entered are
by
filled 1
in
any
all
the
single box, the other boxes
automatically by subtraction. Thus, fourfold tables are characterized
DF.
we begin by assuming that the drug were drawn from the same population (null hypothesis). From the pooled data we can then estimate an expectation for each box of the table. The placebo group was 73/145 of the total, and 38 subjects altogether showed a delay. On the assumption that the drug and placebo do not differ, we Since there are no a priori expectations,
and placebo
results
108
ENUMERATION DATA
should then expect the proportion of subjects showing a delay to be the same in both groups. Thus we expect in box a, (73/145) x 38 = 19.1. The remaining expectations can be c
by subtraction from the marginal
filled in
totals: b
=
18.9,
= 53.9, d= 53.1.
\ m the usual way, remembering (since there is only one degree of freedom) to make the Yates correction. In box a for example,
We now
calculate the contribution of each
box
(£0) = (19.1 8) = (£O)c = (E0)c = 2
^^ E
2
to
11.1
10.6
\\2
= 5.9
The contributions of boxes b, c, d are similarly computed to be 5.9, 2.1, 2.1, and so x 2 — 16.0, P v
^ 200
oo
c
W.
•
(V
Q 150
c o .,_
Q.
.100
o
i
i
i
Micrograms *J.
K.
Kodama and
Allyl
i
i
4
2
8
6
Alcohol
C. H. Hine,
/.
Pharmacol. Exptl. Therap. \2A: 97(1958).
assay of allyl alcohol. This choice line
would be used
wish to will lie allyl
What we
know is the range of optical densities within which the reading when we perform a single determination on a given amount of
alcohol.
Again,
impression, since
Moreover, is
correct because here the regression
is
for the prediction of a single observation.
it
will
however,
the
range
+ S.D.
gives
a
false
only include twothirds of the observations.
this case presents a further complication. If the regression line
unknown amounts of allyl alcohol, the the unknown from a given optical density
to be used for the assay of
problem
will
be to estimate
reading, and to attach a confidence interval to such an estimate. This
problem of estimating x from y can only be handled properly by means of the procedures developed later in this chapter.
Of
the various shortcuts for graphic representation of the reliability of
correlation data, the most satisfactory
is
that
illustrated
in
Fig.
44.
134
CORRELATION
Vertical lines are again used, but to the 95
%
now
the length of each line
confidence interval for the true
Obviously the true regression
line,
mean y
made
equal
at that value of x.
expressing the general relationship
between y and x in the population, must pass through such individual confidence ranges. Actually this cedure, since the confidence
is
is
all
(or nearly all)
a conservative pro
band formed by joining
all
the upper and
Figure 44 Effects of increasing doses of ethanol on electroshock seizure threshold in mice. Two different shock procedures are represented by the two sets of points. Each point is the value determined in a group of 1525 mice. Vertical lines indicate 95% confidence limits. Both the dose scale and the threshold ratio scale are logarithmic. (Adapted from McQuarrie and Fingl.*)
5000
2000
3000
1000
Dose,mg/kg *D. G. McQuarrie and E. Fingl,
/.
lower extremities of the vertical
band
Pharmacol. Exptl. Therap. 124: 264(1958).
lines is
wider than the true 95
for the regression line as a whole.
Fig. 44 there
is little
Thus
in the
% confidence
experiment depicted in
doubt that seizure threshold was increased
signifi
cantly by increasing dosage of ethanol with one shock procedure, but not
with the other.
MISINTERPRETATIONS OF CORRELATION DATA False conclusions
may
be drawn from valid correlation data.
must be understood that the mere necessarily about cause
and
effect.
First,
it
fact of a correlation implies nothing
True, the absence of correlation
may
lead one to reject a hypothesis that veffects are caused by x. However,
even the strongest correlation does not of causal relationship.
between the
Or y and x turn have 3
A. B.
sale
correlation
may
permit one to infer a
be wholly accidental, as that
3 of bananas and the death rate from cancer in England.
effects
common
may be
caused independently by other factors that in
causes, as in the correlation over
Hill, Principles
1955), p. 185.
A
itself
of Medical
Statistics
many
(New York: Oxford
years between University Press,
Misinterpretations of correlation data
I
35
the salaries of Presbyterian ministers in Massachusetts and the price of
rum
Havana. 4
in
A
second source of misinterpretation
is
unwarranted extrapolation.
One may never assume without good reason
that a regression line will
extend beyond the limits of the observational data, and quite often not. Figure 41 provides a
enzyme life.
is
meaningful, although
increases at a
much
does
activity at a
few
also quite possible that the
slower rate during a longer period of prenatal
Extrapolation in the other direction
of fact
it is
it
perhaps have grounds
back to zero enzyme
for supposing that extrapolation
days prior to birth
We may
good example.
is
clearly
unwarranted as a matter ;
has been shown experimentally that this enzyme activity does not
it
increase further
comment on
beyond that attained
the shortening of the
at 21
Lower
Mark Twain's 5 wry
days.
Mississippi points up the pitfall
of extrapolation in a most entertaining way.
hundred and seventysix years the Lower Mississippi two hundred and fortytwo miles. This is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishingrod. And by the same token any person can see that seven hundred and fortytwo years from now the Lower Mississippi will be only a mile and threequarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. In the space of one
has shortened
There
is
itself
something fascinating about science. One gets such wholesale
returns of conjecture out of such a trifling investment of fact.
ESTIMATING A REGRESSION LINE FROM SAMPLE DATA
Now
how
us return to a detailed consideration of
let
should be treated to obtain a regression line and Let y be the true value of y at any given value of the true regression line will be j>axis at 4 D.
x
Huff,
=
0,
How
and b
is
y =
a
the slope.
to Lie with Statistics
+ 6
its
x.
bx, where a
correlation data
confidence limits.
Then
is
the equation of
the intercept on the
The observed ^values
(New York: W. W. Norton
&
will scatter
Company.,
Inc.,
1954), p. 90. 5
Mark Twain, Life on the Mississippi (New York: Harper also known as the regression coefficient.
6 b is
&
Brothers, 1874), p. 155.
136
CORRELATION
above and below the true already indicated,
to yield a set of deviations (y
line,
we choose
— y). As
as our best estimate, the "leastsquares" line,
from which the sum of squared deviations
X (y — y)
2
will
be minimum.
In order to find the values of a and b which will minimize the
squared jdeviations from the
dQ ^ =
line,
we
sum of
set
dQ
„
= f db
and
da
where
Solving by partial differentiation yields
0=2(XyN«6£x) Na = a
Thus, y at x general
is j>;
mean
=
X bXx >•
j>
—
6x
words, the leastsquares
in other
line passes
through the
x,y of the observations.
Partial differentiation with respect to b gives
= 2£>a£x&5> 2) and substituting now for
=
a,
I x>'  j X x +
foe
XxbXx
2
£x,.Q>XZ*) 2
XX XX
A
y
^2
_ (XX)
_ ss;
An 7
equivalent expression can be obtained
7
which shows somewhat
The equation
2 (* ~ *)(? ~ y) derived by adding and subtracting y x in the numerator and x * in the denomi(•* *) 2 is familiar as the SS term for x, which appears in the numerator of the ^variance. ( x ~ x){y —y) is a new term, which is the analogous numerator of an expression known as the covariance of x and y, and will be symbolized here by SP X y since it is a sum of products rather than a sum of squares.
X
is
nator.
X
—
X
X
Estimating a regression line from sample data
more
clearly just
what the slope
suitable for computations. (for later use)
£j
2 ,
can
all
represents, but the
The required
above equation
£ x, £ y, £ xy, £
terms,
137
more x and
is
2
be found automatically on a good calculating
machine.
The
sign of b
may
be positive or negative, depending upon whether
^values tend to increase or decrease as xvalues
become
Figure 45
a regression line.
Diagrammatic representation of the slope of
larger. If >' values
varied without any relation to the associated x values, there would be
no
correlation,
Since
and the true slope would be
we know from
mated regression shall
we
line,
zero.
the foregoing that the point x,y
and
since this point
is
not be concerned further with the intercept,
lies
a.
Substituting a
obtain a more generally useful equation of the regression
y
=
y
+
b(x

x)
on the
esti
so readily computed,
=y—
line,
we bx
38
CORRELATION
Rearranging
this expression,
we have
the reasonable description of slope
as the ratio ^deviation to xdeviation,
b
y =x
from the point x,y
—
x
as illustrated in Fig. 45.
To draw
the estimated regression line, once
its
equation has been
found, plot (x,y), as shown. Then add a convenient amount to x (preferably
an amount as possible) and b times this amount to y, and plot new point (x + Ax, y + bAx) (see Fig. 45). These two points determine
as large
the
the line.
The following example
illustrates the full
procedure for calculating an
estimated regression line for the data depicted in Fig. 41.
Example
41
.
Calculation of a regression line.
The following data were obtained
for the ability of liver slices
from guinea
pigs of different ages to conjugate phenolphthalein with glucuronic acid. Calculate the equation of the regression line.
Age
Millimicromoles
{days)
Conjugated *2
X
y2
y
xy
1
1
5.6
31.4
1
1
8.8
77.4
3
9
5
25
18
324
90
6
36
31
961
186
10
100
38
1,444
10
100
44
1,936
11
121
22
484
380 440 242
14
196
37
1,369
518
15
225
46 54
2,116
690
2,916
1,134
441
21
2> = 97
2*
2 ==
144
12
2> =
1,255
h
x
_
94° 400~
= 8.82,
y
2.4
= 28.8
=
11,803
5.6 8.8
36
l[xy = 3,730.4
Estimating the error variance and the confidence interval for a regression line
Then
the equation of the regression line
y
The
is
= 28.8 + 2.4U
above example
line calculated in the
Figure 46
139
is
8.8)
shown
A
as
Estimated regression
in Fig. 46.
line
and confidence limits
for the data of Figure 41.
60
/
/
/
B
/
/
/ /
50
/
/
40
/
//
30 CO
/
o
/
E a.
E
/ •A / / A
c o
20
•
/
/
/
/ /
,
//
/
/ // / / / f /
/
/
/
—
/
""
/
>
/
/
//
/ •/
10
B
/
/
/
/
//
/
/
/ »
/
s /
 (L*)(2>)' N
6(SP„)]
Example 42.
W=
j
Example
!^2.4[3730
41
30 691 >
= 49.56 8 Since two parameters (mean and slope) have been estimated, there are {N —2) DF. Division by (N — 2) makes s y z 2 an unbiased estimate of o y x 2 just as division by (TV— 1) made s 1 an unbiased estimate of a2 for the singleparameter case. 9 Also, note that .
I(y y) 2 =I[y y b(x
=2 (y Substituting b
=
/
(.x
^
y) 2
—
— x)(y — v) ,
m
~ 2b I
(
2
x *)(y *>
+ b2 1 (*
then gives the result shown.
*) 2
Estimating the error variance and the confidence interval for a regression line
sy x, the square root of the error variance,
is
.
individual j'values about the regression line. If
we could
points on the regression line,
use
it
a standard deviation of
were the same at
it
to obtain
+ 1.96
is
demarcated
sy x .
in Fig. 46
from the regression
by
95%
parallel lines (B) at vertical distances
(where s y
=
1 1
.
For reasons explained below, such a "confidence parallel lines,
Usually line as a
may
be
is
likely to
we wish
finding a confidence interval for
standard error of a
mean
Example
has
43.
(N 
2)
its
y
A
x, y.
limited approach
at x, thus establishing
central region.
in the singleparameter case,
y t
bounded by
to establish a confidence interval for the true regression
true limits for the regression line in
where
interval,"
be too narrow except in the vicinity of
whole, rather than for individual ^values.
made by
=
7.04) and x N /49.56 points of the experimental sample.
line
these lines are seen to include all
all
an approximate
confidence interval for individual >>values. Such an approximate interval
141
By analogy we have
to the
^
fc
=
=
y±
t(sy)
DF.
Confidence interval for y at x.
Calculate the 95
% confidence interval for y at x from the data of Example
41.
From
the data,
j
= 28.8
Syx 2 = 49.56 
49.56 C_2
=
:
4.51
11
sy= V4i5l = >?
Consulting Table 5 at 9 ±2.26(2.12)
=
DF
2.12
= 28.8±f(2.12)
and
P = 0.05, we
±4.8. Then, at x,
y
= 24.0
to 33.6
find
t
= 2.26,
so the limits are
142
CORRELATION
may
Lines parallel to the estimated regression line limits of
y
at x, as
shown by
be drawn through these
C
the short solid segments of lines
in Fig. 46.
In the region close to x, y these will include the true regression line 95 times out of 100. However, such parallel confidence limits cannot be extended beyond the immediate vicinity of x, y.
^values about y remains the
The main reason is that even same at all xvalues, there is
uncertainty about the true slope of the estimated regression
doubt about the true position of the
line
is
the variance of
if
considerable
still
line.
The
resulting
very small near x, y but becomes
greatly magnified with increasing distance along the line.
Accurate confidence limits
will therefore
be represented by curves that
are convex towards the estimated regression line, the confidence interval
becoming wider with increasing distance from
This
3c.
is
represented by
an equation containing a weighted correction term, (x increases the magnitude of s y
2
Sy
_ ~
2
at increasing distance
Syx
N
+
x,
x)
x)
2 ,
which
from x:
:
£(xx)
Only the correction term changes with terms are already known. At

(x
1
2
—
where {x
different xvalues; the
—
x)
2
=
0, the entire
remaining expression
reduces to that already used for calculating confidence limits of y at
Example
44.
Calculate the
95%
confidence limits of the true regression line at several
representative values of x, for the data of
At x
x.
=
1
5,
Example
41.
for example,
^(158.82) ao^\ — + = 49.56 1
Sy 2i
_11
2
]
400
= 9.24
Then
95%
limits are ±2.26(3.04)
=
±6.9 from the estimated
line,
as
com
pared with ±4.8 at x. Similar computation at other xvalues leads to the biconvex curves C in Fig. 46, which will include the true regression line 95 times out of 100; they should not (and do not) include 95
%
of the individual points.
143
Confidence interval of a slope
In summary, a confidence interval for the true regression line
found
A
two ways.
in
purposes
rough approximation that may
to estimate a confidence interval for
is
upper and lower
limits of
squares regression line
y thus obtained,
may
y
at
3c;
be drawn. These
to find confidence limits of
is
pertinent range of observations; this
some
through the
lines parallel to the least
approximately bound
will
the true regression line in the immediate vicinity of x,y.
method
may be
suffice for
y method
at several will
The accurate
x throughout the
yield
an hourglass
shaped area, narrowest at x,y and flaring out at a distance, which accurately defines the confidence interval of the true regression line.
CONFIDENCE INTERVAL OF A SLOPE It
has already been pointed out that
if
would be no correlation and the
relation to the associated xvalues, there
true slope,
/?,
would be
data might yield a
or negative slope
finite positive
determining with a
rtest
real or
is
whether b
by calculating confidence
ascertaining whether zero
sampling variance of b
random sample of estimate, b. The question
zero. Nevertheless a particular
whether an apparent correlation
better yet,
jvalues varied without any
is
is
not must then be answered by
differs significantly
included in these.
It
2 s
and then,
~Z(xx) 2
as usual,
t
J
b
(*"/*)
~P) =
1
2
(If
N and the confidence
interval for
P
where
t
has
(N 
2)
DF.
/?
=
is
given by
t(s
b
±
y x) .
Vss x
zero, or /?,
and
can be shown that the
given by the following equation:
Sb
from
limits for the true slope,
CORRELATION
144
Example
45.
Calculate the 95
%
confidence interval for the slope of the regression line of
Fig. 46.
2.26(7.04) 1
1,255^ 11
= 2.4 ± 0.8 = Since the limits
1.6 to 3.2
do not include
zero, there
is
a real positive correlation between
age and enzyme activity and the true slope
is
not
3.2,
both statements made at the
5%
level
less
than
1.6
nor greater than
of significance.
SIGNIFICANCE OF A DIFFERENCE BETWEEN TWO SLOPES By analogy Student's
/
to the test of a difference between
may
cantly. In other
be used to
words
upon which two estimates, b
whence
t
and
may be
test
whether or not two slopes
this is a test
b'
.
A
2)
is
10
calculated in the following equation:
=
—
bV 1
consulted at the desired level of significance, with
DF, and
two slope
pooled error variance has to be computed,
1
—
differ signifi
of parallelism. Data will be available
different regression lines are estimated, with
t
Table 5
two sample means,
parallelism
is
rejected if the critical value of
/ is
(N —
2
+ N'
exceeded.
CORRELATION COEFFICIENT The absolute magnitude of a slope obviously depends upon the particular units used on the x and y axes, just as the absolute magnitude of a standard deviation depends upon the units of measurement. We can 10This
same
is
legitimate only
if
the
two estimates of error variance are substantially the compare two sample means, (p. 52).
just as in pooling variance estimates to
Correlation coefficient
ascertain whether or not a given slope
is
significant,
but there
decide from the value of b alone whether a correlation
We
were able to express standard deviation as
(p. 36)
by relating
it
to x,
is
145
no way
is
to
strong or weak.
a coefficient of variation
and thus to obtain a comparative measure of
the relative homogeneity of data from different normal distributions.
Here the problem
is
We
similar.
wish to have a measure of slope which
independent of any particular units of measurement, and which
is
will
indicate the strength of correlation for any array of data in comparable
terms.
The
— y) 2 The
X (y

(cf. p.
may
strength of correlation
variance (or SS) that
is
be defined as the fraction of the total
due to regression. The
SS due to regression
is
total
SS
given by
is
the total SS less the error SS, or
140), '
Regression SS
=£
(y

2
y)
[(
 (^ (y  yf 
"j 03
(
*^ g ^
}
Then Regression SS
=
Total SS to
which we give the symbol
r
2
E*yr
[(x
—
N
ratio of
2
(ly)
Ly
may
r
(ss,xss,)
]
N
gives the strength of correlation directly in terms of a
two variance components,
square root, r
(SP X ,) 2
J ,
2
2
(I*XI»1 2
r
y)~\
Simplifying, for ease of computation,
.
(L*) irYv 2 Although
—
I(xx) 2 lO>*) 2
2
N
x){y
(known
it is
nevertheless customary to use the
as the correlation coefficient) instead.
vary from zero (no correlation) to
—
1
or
+1
The value of
(perfect negative or
positive correlation). It
can be shown from the above expression for
r
2
that r
is
very directly
related to b,
Thus
r is really
the slope of a universal regression line, plotted on trans
x
and
formed coordinates, x and y values being replaced by s
y .
s
It
follows
2
146
CORRELATION
that
all
correlations which are equally strong will have the
relation coefficient, regardless of the apparent differences
on the
slopes of the regression lines based
The
significance of r
may be
original
this
way
is
(N —
5 is entered with
that r estimated
significance of a correlation
Example
data.
2)
2)
DF. The
null hypothesis tested in
from the sample represents a true correlation
coefficient of zero. Obviously,
standardized slope,
cor
estimated from the expression
r\N 1 f where Table
raw
same
between the
it
makes no
difference whether the test of
performed on the actual slope,
is
b,
or
on the
r.
46.
Compute
r
and
its
significance for the data of
3 ' 73
°
Example
(97)(316.4)1 ii
[
J
(97)2i r
1,255
41.
^j
(316.4)21
11,80311
J
= 0.818 = 0.904 >=
0.818(9) /
yj
Critical
/
= 3.25
at
„, \ 0.182
P0.01
=6.36
with 9
DF.
is real (P < 0.01) and due to regression.
Thus, the apparent positive correlation since
82%
The
of the total variance
is
distinction between the significance
it is
very strong,
and the strength of a cor
relation recalls the similar distinction between the significance
magnitude of a difference between means analogous way, a correlation
no
practical consequence.
On
may be
(p. 29).
significant, yet so
the other hand,
it
weak
may appear
yet because of the small sample size or large variability of it
may prove
not to be significant.
and the
Here, in a completely as to be of
to be strong,
measurements
The logarithmic transformation and the log doseresponse curve
147
THE LOGARITHMIC TRANSFORMATION AND THE LOG DOSERESPONSE CURVE Because linear regression
is
so easy to deal with,
it
is
customary to
transform nonlinear correlations into linear ones whenever possible. For example, x or jvalues
may be
plotted as their reciprocals, squares,
square roots, ratios, or logarithms. The choice of a particular transfor
mation may have a theoretical basis
enzymology
11 ),
or
may
LineweaverBurk plot
(as in the
be purely empirical on the grounds that an
approximately linear correlation
results.
For biological data the logarithmic transformation
Many measurements
yield
are plotted directly but
fit
are used. In
For example, the mean heart
40,
left
most
useful.
the symmetrical normal distribution better
some
cases this can be attributed
to a limitation of the possible range of variation in
Deviations to the
is
skewed frequency distributions when xvalues
when logarithms of xvalues other.
in
rate in
man
is
one direction or the
about 70 beats per min.
(lower rates) are restricted by a lower limit around
whereas possible deviations to the right
On
may
be
min
to the left (log 35/70
logarithmic scale a deviation of 35 beats per
much
greater.
a
= =
— 0.3) will correspond to a deviation of 70 beats to the right (log 140/70 + 0.3). It is also true that responses to drugs tend to vary proportionately to log dose rather than to dose, so doseresponse correlations are routinely
plotted with logdose rather than dose
An
xaxis.
important type of correlation encountered
biological experiment
is
ment) and the response
in
several
kinds of
that between the dose of a drug (or other treatelicited. If
isolated tissue or single animal, a first
on the
dose
is
increased systematically in an
graded response may be obtained. At
there will be a range of doses so low that
no response
is
manifest.
Then a higher range of doses elicits responses of increasing magnitude, and finally a maximal response may be attained which cannot be exceeded at any dose. If log dose is plotted on the jcaxis and response on the >'axis, a symmetrical sigmoid curve is characteristically obtained (Fig. 47) whose central portion is nearly linear. This means that, over a considerable range of doses, increasing the dose by constant multiples causes equal linear increments of response.
Figure 47 shows three different ways of plotting dose on a logarithmic basis.
U J.
The upper S.
&Sons,
Fruton and
scale
S.
shows actual doses, spaced so that successive geo
Simmonds, General Biochemistry, 2
Inc., 1958), p. 252.
ed.
(New York: John Wiley
148
CORRELATION
Figure 47 Graded response of cat nictitating membrane to epinephrine injection fn Each point is the mean response in 5 cats; the same 5 cats were used for the entire curve. Actual contraction amplitude (after magnification) is shown on the left scale, percent of estimated maximum contraction on the right. (Data of Maxwell et al.*)
vivo.
3 Q_ 70 CD (S)
o =3
CT>
,
Dose (mg/kg)
.002
.004
.008
i
1.50
I
.128 i
i
2.00
2.50 Log dose
.064
.032
.016 i
1.00
i
Arbitrory log
*R. A. Maxwell
et al., J.
Pharmacol. Exptl. Therap. 131: 355(1961).
metric increases (here doublings) are equally spaced. In the middle scale, actual logarithms are designated.
The bottom
scale
would ordinarily be
used only on a working graph, not for the final display of data; a transformation that greatly simplifies computations. in the geometric series
it
illustrates
The ascending doses
have been coded by assigning integer numbers
beginning with zero. In the present example the actual logarithm of the lowest dose, 0.002, would be 3.301, 12 Because it
simplifies
12
computations we
representing negative logarithms as the
and
shall
sum of a
this is
coded as zero. Since the
employ the established convention of negative characteristic and a positive
mantissa. log 2
= 0.301
log 20
=1.301
log 0.2
=
1.301
Conversions to and from logarithms may be accomplished with the aid of Table
3.
The logarithmic transformation and the log doseresponse curve
149
successive doses are doublings, the coded log units must differ by log
2
=
Thus any point on the
0.301.
arbitrary log scale can be decoded by
For example, 3.200 on the
multiplying by 0.301, then adding 3.301. arbitrary scale
would correspond
=
actual log units, and antilog 2.264 dose.
Log
doseresponse
3.301
=2.264
in
0.0184 mg/kg, the corresponding
The coding and decoding procedures
Figure 48
+
to (3.200 x 0.301)
curve
are exactly analogous
cumulative
a
as
normal
when
frequency
distribution of sensitivities of the individual responsive units.
90
BO
/
^
2 8 §
< § ,0
doseresponse curve
log
70
8
V
r
•;
/I
Ee
/i
40
J^~~
a> ">
1
~"**>^
Sj >//
>
30 — o
>J j\.
/ A
sensitivities of individual
responsive units
\
20
—~
a>
Q Q_
3
Dose *A.
S.
Kuperman
different,
and
et al., /.
to
30
10
300
100
1000
/i.g/kg
Pharmacol. Exptl. Therap. 132: 65 (1961)
what extent the two
ED 50 values really differ.
If
two drugs
but differ only
by the same mechanism upon the same of their curves must be the same. Conversely, different slopes imply different mechanisms of action. If the two curves are equiresponsive units,
act
in potency, the slopes
distant is
from each other
in the horizontal direction at all
meaningful to state the difference
fication.
If the
meaningful only
two curves are not if
in
parallel,
the particular response
response
levels,
it
potency without further qualia potency comparison
level (e.g., the
ED 50
)
is
is
specified.
Figure 49 illustrates some parallel and nonparallel curve segments for different drugs acting
upon
the
same
biological system.
CORRELATION
152
The their
relative
two log
potency of two drugs
ED 50
values,
and
obaitned as a difference between
is
since
log x'

=
log x
log
— x
the comparison
Example
expressed as a potency ratio.
is
48.
shown
In Fig. 410 are
the effects of a single drug in depressing the amplitude
pH
of contraction of the turtle heart at two different the
ED50
pH
at each
and the potency
Figure 410
Effects of
heart. Each point et al.*)
is
ratio for the
pH on the
pH
by eye
values.
action of pentobarbital on the turtle
mean value from
the
values. Estimate
two
10 hearts.
(Adapted from Hardman
100 
80 
c o (/>
Q. a;
pH7
pH
5
8.5
60
q)
~ q_
40

—D
100
/
/
9i
B n
,•••'!
"
"
...••"
1
[
a
80
/
/
^=60
1
1
*
1
c O
n
/
to/
o>
f
~
~/
••
1 1
* >
1
40 4
/ /
20 /
s
'
'
// /
/ • •
/
/
H
I
f
1
q/
/
7m •
o
c
00001
0.001
0.01
1.0
0.1
10
Dose (mg) *W. R. Bryan and M.
unequal
95%
B. Shimkin, /. Nat. Cancer Inst. 3: 503 (1943).
confidence limits, 1.6 on the low side and 6.8 on the high
side; likewise the confidence range of the
ED 50
for
one of the drugs
extended from 0.22 below to 0.36 above the estimated value of 0.60 mg/kg.
For many purposes
it
may
suffice to
draw the
ED 50
curve by eye, and to estimate slope,
,
entire log doseresponse
or potency ratio directly from
the approximate curve, without applying any statistical analysis at
Indeed, biological data
may sometimes
all.
be unavoidably poor, so that a few
rough conclusions with which everyone can agree may be preferable to
an elaborate
will
statistical analysis
not really justify.
A
which the experimental observations
good example
is
presented in Fig. 411. Here
we
see doseresponse relationships for three carcinogens given subcutaneously
to mice. 13J.
Now
it is
quite evident that
A
and B have about the same slopes
T. Litchfield and F. Wilcoxon, J. Pharmacol. Exptl. Therap. 96: 99(1949).
CORRELATION
154
and very nearly equivalent potencies. C, on the other hand, potent, but the data are so variable that ratio in
it is
is
clearly less
hard to estimate a potency
which we could have much confidence.
If these selfevident
con
would serve no
clusions are sufficient, then further statistical analysis useful purpose.
NORMAL EQUIVALENT DEVIATIONS AND
PROBITS
Except for rough approximations, the assumption of linearity over any considerable segment of the log doseresponse curve
proper analysis requires that the data as to
make
the curve linear over
plished by converting the jvalues units
known
as
its
first
is
untenable, and a
be transformed in such a way
entire extent. This
can be accom
from percents of maximal response
normal equivalent deviations (N.E.D.).
A
"N.E.D."
is
to
the
response increment brought about by increasing (or decreasing) the log
dose by one standard deviation, taking the 0). is
This
shown
is
N.E.D.
0,
ED 50
as starting point (N.E.D.
in Fig. 412. Centrally placed
corresponding to
50%
on the
jaxis at the
of maximal response, or
50%
left
of the
cumulative area of the normal curve. Since in the normal distribution an
+o
increment of
34% of the area (Table 4), N.E.D. 1.0 = 50% + 34% 84% of maximal response. N.E.D. 1.0 50%— 34% = 16% of maximal response. Theoretically,
corresponds to corresponds to
from
\i
includes
such a transformed scale has no upper or lower distribution itself
however,
is
considered to extend from
±2 N.E.D.
the xaxis
to
+oo.
Practically,
in actual biological experiments.
a logdose scale in which log
is
— oo
normal
are usually sufficient to include the extremes of
meaningful data obtained
On
limit, just as the
ED 50
is
always chosen as
the zero point. All log doseresponse curves will therefore intersect at (0,0) in the center
of the graph, regardless of relative potencies. Their
slopes, however, will differ. In
creasing log dose by 0.3
of
maximum,
sensitivities
or
1
(i.e.,
Curve A of
Fig. 412, for
example,
in
doubling the dose) raises the response to 84 %
N.E.D. In other words, the standard deviation of
of the responsive units to this particular drug
Curve B has twice as steep a
is
0.3 log units.
slope, indicating a standard deviation of only
—
i.e., a more homogeneous population of responsive units. Even more convenient than normal equivalent deviations are units known as probits. A probit is identical to a N.E.D. except that zero
0.15 log units
N.E.D.
is
defined as 5 probits, thus eliminating negative values.
The
Normal equivalent
probit scale
is
shown
at the right
deviations and probits
155
of Fig. 412. Table 17 permits direct
conversion of any percent to the corresponding probit.
Graph paper
is
available
on which actual percentages are shown on the
7axis (as in Fig. 412), spaced according to the corresponding
N.E.D.
Figure 412
Transformation of the cumulative normal distribution doseresponse curve) to normal equivalent deviations and to probits. The scale of percent response is also shown at the left, as it would appear on probability paper. (log
+0.3
0.3
Log Dose Log £~D 50
This
is
known
as probability paper, or logprobability paper, according as
the other coordinate scale
paper
is
sometimes useful,
is
linear or logarithmic.
it is
Although such special
often easiest to convert percent response to
probits (by means of Table 17) and dose to log dose (by means of Table
3)
or to an arbitrary log scale, and then to use the transformed data both for
computations and for plotting on ordinary linear coordinates.
The slope of
the regression line of probit (or N.E.D.) on log dose
is,
as
pointed out above, a direct measure of the standard deviation (a) of logarithms of individually effective doses
(i.e.,
of doses just effective on the
CORRELATION
156
individual responsive units).
slope
is
equal to
coordinates.
Example
1/variance
divided by the square of the slope, 2
(mx) ~
'yx i
2
(x
1
N
—
=\2
x)
+ SS X
15 The method to be described here for graded responses merely uses the probit transformation to achieve a linear regression line. It may not be entirely valid if many responses are at the extremes of the curve, because the variances of the responses are not likely to be constant throughout. For very accurate results a method of weighting the responses must be used, as described by D. J. Finney in Probit Analysis (London: Cambridge University Press, 1952), pp. 185188.
:
CORRELATION
158
whence the approximate confidence
m— where
t
(N —
has
2)
DF.
=m—
x
m
If
interval
is
and the variance
t(S( m
b
is
often the case
2
becomes
further, to
Syx z
x)
in brackets
still
(mx)
_
(x
term
simplifies
S
_ x) )
reasonably close to x, as
in a well balanced experiment, the negligible,
±
x
would be
2
N
whence
m—
=m—
x
x
KSyx)
+
by/N Although
it is
no exact variance of (m
true that
confidence limits for the ratio of two
random
the
list
Fieller's
Theorem. For
its
one of the basic texts cited in
of references.
A term g must
first
be computed
9
Then
x) can be stated, exact
variables are given directly
by the solution of a quadratic equation known as theoretical basis the reader should consult
—
=
t\sy x
2 )
.
2
b (SS x )
the lower and upper confidence limits (designated by
(m 
00)
(
9)
m x)^VP— + t(Sy.x)
1
x) L
/(I
L and U)
are
~\2 (mx)
,
N
SS X
(1g)
(mx) 2 ]
and 1
Now g will
\,
x
Ks y
.
x)
be recognized as containing the relationship of the slope to
its
2
standard error, since s b
2
s = ^L
(p.
143).
slope estimate, the smaller will be g. If g 0.1)
it
makes a
Thus, the more certain
is
the
SS X is
small enough (less than about
negligible contribution to the
above equations, and then
Analysis of a single curve with graded responses
159
the exact equations for confidence limits simplify to those obtained from the approximate variance. Evidently, g will be small under
all
ditions that reduce the slope variance: if the error variance
small;
slope itself
steep; if the dose range
is
of confidence
is
t
will
analysis, the actual
it
computation of g
if
the
can be dropped; otherwise
it is
from
zero,
other parts of any data
in
will entail
routinely. If
g^
and no confidence
almost no additional
found to be smaller than
it is
and the
retained,
exact confidence limits must be used. If significantly
is
also, if the desired level
be small. Inasmuch as the four terms com
g have to be found anyway for use
work and should be performed 0.1,
and
large;
not too rigorous and/or a large number of observations
has been made, so that prising
is
the con
1,
full
equations for
the slope does not differ
interval can be found.
Example 410.
Estimate the ED50 of a drug which produced the following contractions of a piece of rat small intestine suspended in a tissue bath for amplifying
and connected to a device
and recording the contractions.
Drug Concentration
(*)
Recorded
Coded
Contraction
Percent of Estimated
mm
Concentration
0.09
0.27
1
0.81
(y)
Probit
Maximum
log
8
12
3.82
13
20
4.16
2
19
30
4.48
2.43
3
24
38
4.69
7.29
4
62
5.31
73
5.61
84
5.99
— —
21.9
5
40 47
65.7
6
54
7
63
99
8
64
100
197.
591.
Since the response increment between the last two doses was negligible, estimate 64
mm
to be the
maximum
therefore expressed as percent of 64,
we
contraction. All the other responses are
and converted
maximum,
to probits.
It is
wise to exclude
measured would give them a spurious weight in the analysis. For example, the difference between 99.0% aryi 99.9% of maximal response would hardly be distinguishable in most biological systems, yet the respective probits (7.3 and 8.1) differ considerably. In fact, this probit responses that are very near zero or reliably,
and
their conversion to probits
since they cannot be
:
:
CORRELATION
160
difference
is
34% and 66%
as great as that between the
responses.
For
this
reason the two highest doses in the example have been excluded. A coded log scale is shown in the second column, as discussed in connection with Fig. 47.
The transformed data are plotted in Fig. 413. The usual calculations lead to the following
N=l (7
2* = 21
lx 2
x) 2
—^ = 63.0
x
=3
=9\ 2
2> = 34.06
2> 2 =
(7 y) = 165.73 ^p
= 4.87
>?
169.51
G>)p» = 1Q218
2> = 112.42
112.42 102.18 0.366
91.063.0 sy x
2
=
i [169.51

165.73
 0.366(1 12.42 
102.18)]
= 0.006 s yx
y
=\/0Sm = 0.077
= y + b(x  x) = 4.87 + 0.366(x 
m= For
95%
3.00)
5.004.87
^36^ +3 °°^ we
confidence interval,
find
g
3

36
only 0.01 so
is
we proceed with
the
approximate equation
m — x = (m — x) ±
2.57(0.077)
0.366
\\
(0.36)
2
\/ 7
= 0.36 ±0.21
m = 3.36 ±0.21 Then
in
coded log units the
ED
5o
is
3.36
and
its
95%
confidence limits are
±0.21, as shown in Fig. 413. In order to reconvert the coded result to actual log dose, we
each dose differed from a previous one by a factor of log scale must be log 3
= 0.478.
The
3,
first
note that
so the unit of the coded
starting point, zero,
on the coded log
scale
Parallelline bioassay with
corresponds to log 0.09 thus log
ED 50
=
graded responses
161
we multiply by 0.478 and add 2.954; =0.560 and ED 50 = antilog 0.560 =
2.954. Therefore
3.36(0.478)
+ 0.954 2
3.63 /ig/ml.
The 95% confidence limits decode to give 0.21(0.478) = ±0.100 log units. Then 0.560 ± 0.100 = 0.460 to 0.660 log units, the confidence interval. Limits for the ED50 itself are then the antilogs, or 2.88 to 4.57 ^g/ml. Figure 413 Working graph for Example 410. Hypothetical data on doseresponse for contractions of rat intestine. Each open circle represents a single experimental determination. The solid circle is the calculated general mean. Log ED50 and its confidence limits are shown at the bottom of the graph.
6
—f
...
in
c O
Q. Q>
5
^1
or >*
Xs
'5
(3
)
,
1
1
a. 1
c
/r 1
X
4 c lyT
qED 50 ±9Z Vo
lo
*
1
2
3
confidence
V7hm 3.36
Log Dose (Coded
limit
i
4 Units]
PARALLELLINE BIOASSAY WITH GRADED RESPONSES The purpose of a bioassay is to compare the potency of an unknown with that of a standard, by means of a biological response produced by both
CORRELATION
162
The unknown may be the same material as the concentration being unknown. This is the case when
substances.
standard, only
its
vitamins, hor
mones, or vaccines are assayed against preparations of standard
Then
it
clear that
is
when
the
unknown and standard
are adjusted (by
responses, they will contain the
dilution) to give identical biological
same concentration of the
activity.
On
active agent.
the other hand, the potency
of a different substance (or crude extract of
unknown composition) may
be compared with that of a standard material. In that case a generally valid potency
comparison can only be made
that the slopes of the
A
typical bioassay
if it is first
shown
may
be good reasons, however,
why
estimate the maximal response of the system; then to find
an
ED 50
The method of
known)
by graded response does not require the use of a
probit transformation, although probits will be useful
employed. There
(or
two log doseresponse curves are the same.
and the probit transformation
will
it is
it
if
they can be
not practical to
will
be impossible
be out of the question.
be described here in
parallelline bioassay will
its
simplest
terms, for a 2 x 2 assay, with direct measurement of a graded response (e.g.,
blood pressure increase,
in
mm
of mercury), equal doseratios, and
equal numbers of observations at each dose. This elementary type of assay has several important limitations, but application of the procedures
developed here to more elaborate designs culties.
will present
no
special diffi
16
The procedure is to choose two dose levels of the standard, x S] and xSl and two of the unknown, x Vl and x U2 in such a way that the ratio of the higher dose to the lower is the same in both cases, ,
,
XS2 X Si
U
The concentrations of S or the responses
will
_
^JJi
x u,
are adjusted so that, as nearly as possible,
be matched,
y Sl
The purpose of the
s
analysis
y Vi
is
and
y S2
£
y U2
then to ascertain the potency of
U in
terms
of S, by comparing the two regression lines of response on log dose. If
was very good, these two lines will be nearly identical; otherwise they will be parallel but somewhat separated on the xaxis. The
the matching
statistical
16
More
problem
is
to
estimate the true potency ratio,
defined
sophisticated designs are well described in D. J. Finney, Statistical Assay (New York: Hafner Publishing Company, Inc., 1952).
in Biological
as
Method
— graded responses
Parallelline bioassay with
— for equal dose of
effect,
U
and
its
confidence interval. Since lower effective
dosage means higher potency, the magnitude of potency of
U relative
163
this ratio expresses the
to S.
Figure 414
mean
2x2
parallelline bioassay. Each point is the of 5 responses at the given dose. Parallel regression
lines have been drawn through the respective x, y, points with the mean slope calculated from all the data. (Hypo
thetical data
from Example 41
I.)
50
uz
/ sz
/
{
40
yu
—y
30
"?s
0)
o
20
/
t/li
5,
*
/
10
Log Dose
The procedure is illustrated in an unknown extract of adrenal
Fig. 414.
tissue
A
Units)
standard pressor amine and
were assayed by injecting
venously in a cat, and recording the transient
produced by each
(Coded
injection. Five observations
rise
intra
of blood pressure
were made
at
each dose of
both preparations (the order of injections being randomized) and the mean responses were plotted as the four points shown on the graph. The two
CORRELATION
164
dosages of
S and
cV,
—
and
log scale as
1
may
whatever they
+
definite weights of a
actually be, are plotted
on a coded
The doses of standard pressor amine
represent
pure chemical substance; the doses of
unknown
1
.
represent definite volumes of a certain dilution of the adrenal extract.
Suppose the matching had been equal
effects,
Thus,
if
5 /xg of the standard pressor
extract,
want
to quantify
Now when
to
by giving a confidence
(as in Fig. 414) the
the potency ratio will contain
ml of
that each
extract
of the standard pressor amine. This
jag
would be subject
estimate, of course,
ratio.
amine and x Vl contained
we would conclude
contained the equivalent of 50
lines
produced
units, equal doses
and decoding gives the estimate of the actual potency
xSi were
ml of the adrenal
0.1
two regression
perfect, so that the
were exactly superimposed. Then in coded
some
uncertainty, which
we would
interval for the potency ratio.
matching
is
not perfect, the estimate of
two components, one due
to the horizontal
separation of the two regression lines on the coded logdose scale, the
other due to the coding
M be
its
M
the coded estimate identical response.
Antilog
itself.
Let
M
will
c
be coded log dose
will
The decoded estimate
slope
but
M
is
S minus coded
will
be given by
logarithmic,
U for — xv
log dose
M
c
+
(x s
).
then estimate the true potency ratio.
Since the regression lines for 17
M be the log of the true potency ratio,
estimate from the sample data. Since the dose scale
may have
different
S and U are assumed to have the same mean responses, we write two regression
equations, as follows:
9s
=
+
b(x s
yu
= yv +
b (*u
For the same response, y s ys Since
M=x
s
— xv
,
+
= yv
b(*s
for the
~
ys
~ xs) *u)
and
,
*s)
= Pu+
b(*u
~
*u)
same response,
yuys=
bl
M  (*s 
M = yu ~ ys + (xs\

x
,
s
*i/)]
 xn v)
We shall test this assumption below in an analysis of variance (p. 169). Alternatively, could be tested by the significance of the difference between the slope estimates
17 it
(b s
—b v )
from the S and
U data,
by the method indicated on
p. 144.
Parallelline bioassay with
graded responses
165
This equation would apply to the general (uncoded) case. In a symmetrical design with coding, such as represented in Fig. 414,
xs = x v on
the coded scale, so in coded units,
b
M
and the actual value of
The common
b
=
by
all
be obtained after decoding. is
given by
X (x  x) v(y y)u + Y.( x  x )s(y  y) 2 ~\ s X (X  X)rj + I (X  X) 2
U
which the S and
in
will
slope estimate
data are pooled to yield the single slope best
the points.
The
error variance
obtained by an equation
is
usual one for data of a single regression less
U
fitted
DF, and
strictly
analogous to the
except that there
line,
is
now one S and
the symbols (as above) distinguish between data in the
sets.
2 s...,
•>
=
i (
N~Z\

v Ly
0> P
,
2
2
(Lys)
)
N„u
N.
iy s
&E (*  x u)(y  yu) + Z
(
x

x s)(y

y s y]
These equations for slope and for the error variance may also be used with
S and
unequal numbers of observations in terms
may
For
U.
7VS
=
Nv
the various
simply be pooled, as illustrated below in Example 411.
The estimation of an exact variance of same difficulties already pointed out in estimate (p. 157).
An
same form
(m
as for
M — (x
approximate variance
—
x).
s
—
Xtj) is
beset with the
the case of a single potency is
given by an expression of the
Naturally, the error variance, slope, and SS X are
based upon pooled data from both
sets,
S and
j_
[M  (xs— x v)y
Nv
SS X
2
S [Af(3cs*u)]
Ns
whence approximate confidence t(s y . x )
U.
limits are given
M V**)±^Jjr + jr +^ /
1
1
s
Nv
by
[M 
(x s
ss

*„)]
:
CORRELATION
166
Exact confidence limits are again given by
M  (x
s

x v)
b
M  (x
=
U
V
g)
s
Nv)
\N s
Fieller's

Theorem
x v)
+ SS.
where 2
9
and may be neglected
if it is
=
t
(s y x )
2
.
b\SS x )
smaller than about 0.1. In that case the
expression reduces to that based on the approximate variance. In the symmetrical assay, where (in coded units) x s (x s — Xu) disappears. Moreover, if in a symmetrical
=
xv
2x2
lower dose ation (x

is
x)
coded as
=
1,
—
and the higher
1
and SS X
=£
(x

x)
+
as
1
the exact confidence limits
if
g
is
0,
assay the
each devi
and
Nv
).
J
small,
^
)/4 +
b
all
Ns
n
V
b
M„ = M„ + In
=
become
(ig)L and
then x
the term
= N (where N is the total number
2
of observations that are divided equally between
Then
,
,
V
M
N
problems involving two doseresponse
lines,
t
has
(N —
3)
DF. 1
Example 411.
An
extract of adrenal tissue, of
unknown
pressor amine content, was assayed
and recording the tranx 2 assay was used, with a high/low dose ratio of 3 for both standard and unknown. Five observations were made at each dose of S and U, and the order of injections was randomized. From the data given against a standard pressor amine by injecting into a cat sient rise of
18
One
slope.
DF
blood pressure.
is
lost in
each
A
2
ED 5 o estimate,
and another
in the estimate
of the
common*
Parallelline bioassay with
graded responses
167
%
below, calculate the potency of the extract (and its 95 confidence limits) in terms of the standard. Data are peak blood pressure increases in of mercury. Figure 414 is the working graph for this example.
mm
xs 5
m l
Coded x
Nu
Ns,
xs 2
x
+
1
ml
0.1
1
0.3
ml
+
1
15
41
19
51
17
47
25
42
17
35
20
47
13
50
16
56
18
38
23
54
5
5
5
5
5
5
5
5
5
5
5
5
5
N 2> 2>
15 A*g
xu 2
*°i
2
80
211
103
250
1,296
9,059
2,171
12,626
1,280
8,904.2
2,121.8
12,500
16.0
154.8
49.2
126.0
16.0
42.2
20.6
50.0
2
(Izl AT
I(yy)
2
y
J,xy
G»(Z>0 N
80
211
103
250
_ 80
211
103
250
:
CORRELATION
168
Pooled Data Pooled
Pooled
+
S+U
S1
N
S2
Pooled Ui
+ u2
10
20
10
10
20
10
10
20
10
291
644
353
2* X
I*
2
(Ixf
N I(xx) 2
ly y
ly
2
(lyf
N I(yy)
2
2 xy
29.1
32.2
35.3
10,355.0
25,152.0
14,797.0
8,468.1
20,736.8
12,460.9
1,886.9
4,415.2
2,336.1
131
278
147
131
278
147
N 2 (x  x)(y  y) Some of
the terms found above will not be required here, but will be used
later.
We
fnay
now compute
b and
M
c
from the pooled terms above.
2 (xx)(yy) 2 pooled 2 (*  x)
pooled
278 ~~
20
yuys = 35.329.1 — = 0.446 Me = b
13.9
Analysis of variance of bioassay data
W
= ^73 {2 (?  >0  6[2 2
= sy x .
ros
at 17
2
g Since g
is
13.9(278)]
=
32.41
=V 3241=5.69 = 4.452 t\s y
_
2 .
x
)
~b 2(xx) 2
very small, and the assay
may

 x)(y  >)]}
DF = 2.11 r
expression
iV [4,415.2
fcc
169
be used for the 95
2
is
~_
(4.452)(32.41)
=
0.04
(193.2)(20)
symmetrical
2x2{x s = xu),
% confidence limits of M
the simplest
c,
KSyx) b (2.11X5.69)
N
\
/4±
0.199
Mc = 0.446 ±0.396
Now coded
the actual ratio of highdose/lowdose
units; thus,
was
M=
Then
(0.239)(0.446)
±
corresponding to two
3,
our coded unit corresponds to V2 log
= 0.239.
3
(0.239)(0.396)
= 0.107 ±0.095 = 0.012 to 0.202 The
actual potency ratio
lower confidence
ml
U is
limits.
is
The
given by the antilogs of result
is
1.28,
estimated to be 1.28 times more potent than 5
the equivalent of 64 jug S, with
95%
M and of
with limits 1.03 to fig
S; so
confidence limits 52 to 80
upper and
its
1.59.
ml
1
Then
t/
0.1
contains
fig.
ANALYSIS OF VARIANCE OF BIOASSAY DATA As
discussed earlier (p. 64), analysis of variance permits one to segre
gate and examine separately the several
components
that contribute to the
total variability in a system. In parallelline bioassays the total variance
made up of two components, that
due
to error
(i.e.,
the residual withindoses variance).
doses variance can be broken
from
(1)
difference
between
is
that due to differences between doses, and
down
further into four
preparations,
(2)
The betwecn
components
regression
(i.e.,
arising
dose
related differences in response), (3) departure from parallelism of the
two
170
CORRELATION
regression lines, and (4) departure from linearity. Since two points deter
mine a
line,
departure from linearity can only contribute in assays with
more than two points per preparation. 19 Analysis of variance for the data of Example 411 is presented below. The respective sums of squares are formed in a manner analogous to that explained on p. 67. It should be recalled that coding in a symmetrical design does not affect the analysis of
way
variance in any
since all the data are
changed by addition or subtrac
same amount.
tion of the
Calculation of
Sums
Grand Total
of Squares for Data of
41 1.
per Observation for y: (80 + (Zyf = j^
Total SS:
Example
+ 103 + —iQ
211
2
250)
"
= 20,736.8
DF)
(19
2> 2
^~L =
1,296
+ 9,059 +
2,171+ 12,62620,736.8
= 4,415.2 Between Doses:
(IyJ
(3
2
DF) 20 2
(2 yf
Nm
(80)
+
(21 1)
2
+
(103)
2
+
(250)
2
 20,736,
5
JV
= 4,069.2 Preparations:
(Iys Y
Ns
DF) 21
(1
,
(Iy v f
Nv
2
(2^) „(25
2
N
i
/">CT\2

20,736.8
10
=
192.2
!9A defect of the 2 x 2 assay is that no information about linearity can be obtained from the data, and serious error may arise if the pairs of points are not in corresponding positions on the two log doseresponse curves. 20 2 ym and N m refer t0 tne individual dosegroups. 21 2^' Ns and 2>+' N u refer to pooled data of both standard groups and both
unknown
groups, respectively.
Analysis of variance of bioassay data
Regression:
[2
(1
(Pooled slope SS)
 x)(y  j)] 2Pooled 2ix~ ^Pooled
(80
(x
Parallelism:
12 (x
DF)
(1
171
4
,
 x)(y  y)] v 2 1 2 (*  *) V
2 (X  X) S
(131)
250)
:
= 3,864.2
separate slopes and pooled slope SS)
£ (x  x)(y 
[2 (x
2
+
103
20
DF) (Difference between
 x)(y  y)]s 2

211
2
lo" +
(147)
Z(X~ ^)
2
y)] Pooled
2
poo.ed
2
"
fo
3
'
864 2 

,2 8 
Linearity:
Error (Within Doses):
DF)
(16
Total SS

betweendoses SS
= 4,415.2  4,069.2 =
346.0
ANALYSIS OF VARIANCE (DATA OF EXAMPLE 41 1)
DF
Source
Preparations
1
192.2
192.2
Regression
1
3,864.2
3,864.2
Parallelism
1
12.8
12.8
Between doses Error (within doses) Total
—
Regression :
Error Preparations
Error
The
analysis is
due
4,069.2
346.0
19
4,415.2
3,864.2 '
,
=179(1,
16
DF)
/>> ma x = 6.31 and ^ min = 3.70. Then the working probit (y) is 0.14(6.31) + 0.86(3.70) = 4.07, which is an adjustment upward toward
the line. 24J.
T. Litchfield
and
F.
Wilcoxon,
J.
Pharmacol. Exptl. The rap. 96: 99 (1949).
Single log doseresponse curve with quantal responses
Log Dose
Dose
Proportion
Alive/
Empirical
A lire
Total
/^g/kg (x)
Expected
Probit
Probit
5.84
(P)
1,000
3.0
8/8
1.00
500
2.7
7/8
0.88
6.18
250
2.4
4/8
0.50
5.00
5.25
125
2.1
4/8
0.50
5.00
4.67
1.8
1/8
0.12
3.82
4.09
62.5
175
The empirical probits (from Table
17) are
shown
plotted against log dose in
Figure 415, where a provisional regression line has been drawn by eye. Expected probits, read off
from the provisional
line,
have been entered
in
the
last
column above. Weighting
coefficients (w) are
expected probit, cations give
Nw
is
found by interpolation
in
Table
18, for
Nwx and Nwx 2
.
Figure 415
Working graph
for analysis of
Example 412.
T
/
6
adjusted line^
> 1
^^provision al
tr
f\
/ A 1.0
each
obtained for each dose group, then successive multipli
20 Log Dose
30
line
CORRELATION
176
Expected
w
Probit
X
TV
Nw
Nwx
Nwx 2
5.84
0.490
2.7
8
3.92
10.58
28.57
5.25
0.622
2.4
8
4.98
11.95
28.68
4.67
0.611
2.1
8
4.89
10.27
21.57
4.09
0.468
1.8
8
3.74
6.73
12.11
%Nw = 17.53 2 Nwx = 39.53 2 Nwx2 = 90.93 
Weighted x
=
y Nwx
39.53
%Nw
17.53
=
2.25
Working probits (y) are then obtained by entering Table 1 8 with each expected and using the observed sample proportions (not percents) as indicated. Then Nwy, Nwxy are also computed: 25 probit
Expected Probit
Observed Proportion
y
Nwy
Nwxy
(p)
5.84
0.88
6.13
24.03
64.88
5.25
0.50
4.99
24.88
59.64
4.67
0.50
5.01
25.00
52.50
4.09
0.12
3.86
14.44
25.99
2 Nwy
=
= 88.35
2 Nwxy == 203.01 88.35 2 Nwy _ = = 5.04
Weighted y
ZNw
VL53
Finally, the slope of the adjusted regression line
tion analogous to that
on
p.
given by a weighted equa
(iNwx)CjNwy)
2 Nwxy
iNw
2 Nwx
NwX y
(2
iNw
(39.53X88.35) 203.01 17.53
90.93

(39.53)'
17.53 3.85
= 2.15
L79 25 For
is
136:
some purposes Nwy 2 may
also be required.
:
Single log doseresponse curve with qucntal responses
The adjusted (x,y)
regression line, with slope 2.15
= (2.25,5.04)
is
shown
cycle
and passing through the point
in Fig. 415, together
made some
cycle of computation has clearly
177
with the provisional
small difference in the
One
line.
line.
Another
would begin with the adjusted line and proceed exactly as before, to still better estimate, based on the expected probits given by the new
obtain a line.
In the present case (and quite often with real data) another cycle
alter the adjusted line
Before accepting the adjusted line as a reasonable
should be made, sufficiently
order to
in
would
only by an insignificant amount.
homogeneous
make
to the data, a
fit
y
1 test
sure that the various sample responses are
randomly drawn from
to be accepted as
represented by the adjusted regression
This goodnessoffit
line.
out by converting each expected probit (from the new
a population test is carried
back to proportion
line)
responding, and then (multiplying by N), to the expected number of subjects
responding
numbers
each dose group. The difference between expected and observed
in
each dose group
in
(EOy
/ °
a.
a>
rr
Log Dose (Coded
Units)
CORRELATION
182
The adjusted lines with slope 0.89 and passing through (x,y) in each case, shown in Fig. 416. They differ but slightly from the provisional lines and although a high degree of accuracy would require that another cycle of approximation be performed, we shall ordinarily be content with a single adjustment. The test of homogeneity by % 2 is performed with each set of data and the adjusted 2 lines, exactly as on p. 177; the result is an acceptably low value of x Here, if there seems to be any doubt that the data describe two parallel lines, a test for are

parallelism should also be performed. 26
Once we have satisfied ourselves that the observations are reasonable samples from the populations represented by the two adjusted lines, implying also that a linear regression is a reasonable representation, and that parallelism is a reason
we may proceed
able assumption,
to the actual estimates required.
(U)
m
.005.07
5.00
40.362=0.283 0.362 0.283 +

c
+ 0.601=0.623
0.89
0.89
M
 4.98
=
= ms m v = 0.283  0.623 =
.660
1
Alternatively,
M = ^_A + (^  xv
)
C
=
^J^ +
(0.362
 0.601)
= T.660 First
we compute
the confidence interval for each
LD
5o
separately.
For S (Drug A): (1.96)
2
0.164
2
(0.89) (29.6)
m  0.362 =
——
0.079
0.936
=
+
m
is c
B):
^.J.
1
^
(0.079)
2
< m < (+0.309 + 0.362)
0.362)
e
estimated to be 0.283, with
For C/(Drug
/0.936
+0.309
to
0.116
So
1.96 0.8
[
0.478
(0.478
±
2 = 1,247, SS*, 11.43, s y * = 1.905, jc = 5.10, ^ = 13.28, y/x =
results
2
y
same system
chosen for
5,
N
smaller.
187
were
assays and 10 others for protein determinations.
12, 15, 13, 14, 15, 12, 12
2.60. r
= 2.45
/2
Then
at
P = 0.05
with 6
DF
= 6.00
the limits are
(5.10)03.28)
±
(5.10)(13.28)
(5.10)
2
6.00/
2 54 j
(5.10)
2

= 2.77 ±0.71 So the estimate of DNA/protein
29 As
met
is
is
)
(13.28)
2
6.00^j
m
6.00
=2.06
2.60, with 95
shown by W. G. Cochran (Biometrics
% confidence limits 2.06 to 3.48.
7:17, 195!), a requirement that must be
that
x 2 (Nx )
). A misleading experiment than useless. Randomization would have avoided the difficulties. in case («),
is
worse
PI3.
Draw up a sequential list of 50 numbers that will represent the order in which mice are going to be drawn out of stock for assignment to cage A or B. To avoid having to discard all the numbers greater than 50 as meaningless, designate mouse to be drawn by 01 or 51, the second mouse by 02 or 52, and so Then find the 25 random assignments to cage A by entering Table and reading out a set of 25 digit pairs. Suppose the numbers 02, 54, and 05 occurred among this set but 01 (or 51), 03 (or 53), and 06 (or 56) did not. Then the begin
the
first
on.
1
ning of the
list
might look
like this:
Mouse Order 1st
2nd
3rd
4th
5th
6th
01
02
03
04
05
06
51
52
53
54
55
56
A
A
A
When
25 assignments to
for the remaining
consulted,
The
and
25.
A
have been entered on the
Then
the animal
is
the
first
placed
mouse
in the
is
list,
as above,
removed from
B
is
entered
stock, the
list
is
appropriate cage (in this instance, B).
distribution of mice continues in the
same way
until
completed;
this part
a mere physical execution of the specified plan of randomization in which no characteristic of the mice plays any role whatsoever.
of the procedure
is
PI 4.
with which his last name it has in this case deterand begins is certainly a "characteristic of a subject ridiculous objection, but a seem might mined into which group he is placed. This which could influence life, experiences in deeper thought will reveal that many one's name and its related to be may one's response to a psychological test,
The law of randomization was
violated.
The
letter
,,
rank
in the alphabet. It
is
not necessary to cite any particular respecl
in
which
a
name is likely to influence an experimental outcome. The point is that as long as some characteristic of a person influences his assignment to a group, a suspicion they should he. is introduced, that the groups may not be as equivalent as
198
ANSWERS TO PROBLEMS
PI5.
Experimental
Location
Here each if
1
2
3
4
5
1
A
B
C
D
E
2
B
C
D
E
A
3
C
D
E
A
B
4
D
E
A
B
C
5
E
A
B
C
D
letter represents
the experiment
Day
a different attractant. Since there are five positions, it must be conducted on five different occasions,
to be balanced
is
some time. Many other Latin squares could have been chosen. The results obtained will be numbers of flies trapped, and there will be a number in each of the 25 boxes of the so each position can be occupied by every attractant at
5x5
The attractants will be compared by computing the mean numbers for A, for all B, and so on. Any difference between locations will be given by a comparison between the row means, any difference between days by a comparison between column means. table. all
PI 6.
The
blood pressure is not abnormally elevated. Decision blood pressure. If it does not exceed 130, accept; otherwise reject. Many definitions of the normal state of health are necessarily of a statistical nature. Large deviations from the norm imply underlying disease, but there Hypothesis'.
rule:
is
Measure
systolic
systolic
often a considerable region of uncertainty, where health
to be found.
Thus
in the present
all recruits
who
some who
are really quite healthy.
are rejected because of blood pressure above
The
1
%
of
all
may
decision rule
with disease manifested by high blood pressure certainly reject
and disease are both
Among
problem, some rule has to be adopted.
(if so,
healthy recruits (since a
then
1
30 there will be
reject all recruits j8
= 0)
and
will
it
= 0.01).
PI 7. If
you
see X, the chances are 2 to
there are three
X
symbols
in all;
1
that the other side
on the obverse
of two of them an X. The decision rule seen, reject.
Now
if
is
seen,
and we
is
reject,
therefore: If
we
because with that frequency the other side is
seen,
and we
accept,
we
will also
is
will
is
also X. This
side of
X
is
because
one of these
is
be wrong once in three
indeed X, so a
be wrong once
= 0.33.
in three trials,
so
if
0, is
trials,
But /S
a
is
seen, accept;
if
X
= 0.33.
PI 8.
The null hypothesis states that the samples of data from the alcoholtreated group and from the control group were drawn from the same population, i.e., that alcohol
had no
effect.
Since
we
are dealing with score differences, the null
Chapter
hypothesis
that the true score difference
is
is
zero,
I
199
that the sample of score
i.e.,
was drawn from a population with mean difference equal to zero. The null hypothesis will be tested by comparing the observed mean difference with its sampling distribution (about which we will need some information) to see how rare it is. If the probability of its being drawn by chance from the differences
hypothetical null population the null hypothesis
less
is
and conclude
than our chosen value of /\ we
that alcohol
was detrimental
Otherwise we shall be unable to conclude that alcohol
is
will reject
to driving ability.
detrimental.
PI9.
When we drawn from
reject a null hypothesis,
different populations,
differed. This
is
the
same
we conclude
i.e.,
two samples were and control groups really
that the
that treated
as finding that the confidence interval of a difference
may merely mean we had insufficient data to reject it. It certainly does not mean it is true. While may be reasonably small with respect to some specified alternative hypothesis, does not include zero. Accepting a null hypothesis, however,
jS
other alternatives can usually be postulated for which
j5
is
very large.
A
real but
small difference between two parameters might only be detectable by samples too
The confidence interval for the difference would, in this would specify how large a real difference might without being detected. The confidence interval approach is therefore much
large to be practical.
case, include zero, but the limits exist
more
informative.
PIIO.
Not
necessarily.
The
level of significance
estimate of the true magnitude of an effect. certain that
B
is
effective, in the sense that there
really be inert, but only
however, that
A
should not be confused with an
The
is
the
one chance
more
in
100 that
effective vaccine.
is
B
results
mean
one chance is
in
that
it
20 that
is
more
A may
inert. It is entirely possible,
For example,
it
might have been
on a smaller sample than B, so that the results were less decisively different from the control. Again a confidence interval for the potency difference would be more informative than the mere statement of effectiveness.
tried
Plll
was labeled and patients could not know which medications were represented by A, B, and C. However, opinions soon form as to which is which, and these in turn bias further observations. The merits of A, B, and C are sure to be discussed among the patients and among the staff. Any observed effect (favorable, unfavorable, or neutral) o\' one drug, though it may be manifested in but a single patient, becomes common
The
defect in this experimental design
distinctively. It
is
is
that each medication
true that the investigators, nursing
knowledge, and that drug (be the experimental group and
it
staff,
A, B, or C) acquires special properties throughout
for the remainder of the experimental period.
200 ANSWERS TO PROBLEMS
For these reasons the drugs to be administered to each subject must be coded and without clue as to their relationship to any other subject's medications. Random serial numbers may be employed; or the entire sequence of medications for a given subject may be labeled with that subject's name, and numbered serially.
individually
CHAPTER
2
P2I.
Since the data are paired for each subject, the procedure should the pairing.
We therefore calculate
each of the 10 subjects. This gives the following
+ 14, + 26,
+19,
make
use of
the score difference, placebo minus drug, for
4,
26,
6,
of differences:
set
29,
+17,
Inspection of the signs of these differences makes
it
+10
+5,
seem unlikely that the drug
causes any significant improvement. Certainly six positive and four negative
group of 10
signs in a
chance basis
in a
is
entirely compatible with the 5
+ 5— ,
expected on a
population with true difference zero.
The signedranks
test
would require a preliminary rearrangement
in
rank
order without regard to sign, as follows:
d Rank Then
4,
the
1
+ 5,
6,
+ 10,
+ 14,
2
3
4
5
sum of the
negative ranks
Table 16 shows that at not exceed
10.8.
We
P = 0.05
6
22.5, the
is
for
+ 17, + 19,
one
tail
(+26,
7
sum of the
and
29
26),
10
8.5
positive ranks
TV =,10, the smaller
could not, therefore, assert a significant drug
is
32.5,
sum may
effect
from
this test either.
Had we chosen
to
employ a
5> =
Q»
/test,
we would
find, for the set
2 x 2 = 3,216.
+26 2
^ •*
N SS
x
=
(\ fxl D /.O
= 3,148.4
+2.6 s
2
=
3,148.4
9
sz
2
=
349.8
= —
349.8
~\A Q 9, 34 
io sz
=
V34.98
5.92
of 10 differences,
201
Chapter!
x '
=
+2.60
0439
= T92 = 7,
which Table 5 shows to be not significant. To answer how great an improvement or decrement might be attributable to the drug yet not discernible here (Type II error), we establish 95% confidence limits for /z, the true difference. Table 5 shows / = 2.26 for P = 0.05 at 9 DF.
±
/x= +2.6 ±2.26(5.92)= +2.6 = 10.8 to +16.0 These are the is
which the true drug
limits within
included in them, the drug
may
13.4
probably
effect
lies.
Since zero
be entirely ineffectual as already shown.
It
on the average no greater than + 16 points on a base score of about 300, in other words, no greater than about 6% improvement. Moreover, although the drug could also be having a detrimental effect, that effect would be no worse than about a 4% decrement from the original score. is
also clear, that
if
the drug does have a beneficial effect, this
is
P22.
1005
=
Here C.V.
=
and x
15
x
= 282,
so
15(282) s
423
=^o^= = = S£ '2 = s2
1
89.45 since
5 2 (zz')=
13.4
x
282
x!
=
S(xx')
heart rate
DF
is
240
= 42
value 2.55 for one
critical
so the
18

13.4
This exceeds
For the
told the C.V. did not change
 x') 42 '=—=3.13
t=
fall in
we were
178.9
S(zx')=
(x
Then
,789
89.45 for the group initially
Sx 2
95%
is
tail at
^=
0.01 with 18
confidence limits of the true change
in
2.10 (x

x)
±
fosCw*')
P23.
Code by
DF
(Table
5),
highly significant.
subtracting 40.
Then
= 42 ± 2. 0( = 14 to 70 1
1
3.4)
heart rate,
t
05
with
ANSWERS TO PROBLEMS
202
Xc
2.16
N
SS == 12.98
+ 3.6 Y~

=
12.98
+0.6
52 =
= 2.60
5
2.60
~6~~ 0.433
S£ 2 =
Si
= 0.658
Decoding,
x
=
+0.6
+ 40 = 40.6
! = 10(H/Z60 =40 cv= j00 * 40.6
For
99%
confidence interval, fi
r
DF)=4.03
i(5
= 40.6 ± 4.03(0.658) = 37.9
to 43.3
P24.
The obvious question on each criterion.
scores
in this
We
problem
is
whether or not caffeine affected the
begin by inspection of the data.
It
appears that the
"alertness" scores are improved, and that the "relaxation" scores are decreased;
but
it
is
obvious that "nervousness" scores are not consistently changed and
nothing would be gained by analyzing them. The
an analysis of variance on the "alertness" Placebo
X
x
1
2
X
50
first step,
then,
is
to perform
scores.
mg x
2
300
X
mg x2
1
1
2
4
2
4
4
16
1
1
4
16
4
16
1
1
2
4
3
9
1
1
3
9
2
10
16
4
100
256
r=28 T2
= 784
203
Chapter 2
Preliminary Calculations
= 784/15 = 52.3 doses: 360/5 = 72.0
Grand
total
Between
Observations:
per observation „
=82.0
82
„
ANALYSIS OF VARIANCE
72.052.3=19.7 By diff.= 10.0
12
82.052.3=29.7
14
Between doses Within doses (error) Total
Table 7 gives significant.
A
We
F
=6.93
i
Variance Estimate
DF
SS
DF,
at 2,12
2
F 11.8**
9.85
0.833
so the betweendoses effect
highly
is
are then justified in going further.
summary of
"alertness" score differences for each subject provides the
following: caffeine
caffeine
caffeine
(150 mg)
(300 mg)
(300 mg)
minus
minus
minus
placebo
placebo
caffeine •
2,
1,
Since
no
all
3,
1,
one kind
in
/test
is
2 (Z*) /W=7.2, SS =
3.21
and r
s
4
at
5,
from placebo.
N = 4,
of the zero leaves
However, a
a sample of
2,
3,
and
the differences are positive,
signs of
that both doses differ
=
4,
2,
1
3
s
1
there
is
shows that if there are no difficulty in observing
In the last set of data, however, elimination
and
or the signedranks
test
2
test.
^x = 6, = 1.2, 2* = 10, =0.140, s$ = 0.374, t= 1.2/0.374 2
yields
= 0.700, s £ 2 DF (Table 5) = 2.13 in 2.8,
0,
2,
1,
mg)
since Table 10
P < 0.05,
too small for the sign
appropriate,
(150
jc
a onetail
test.
Thus we may
reject
the null hypothesis of zero difference between the two doses.
Proceeding similarly, we find that analysis of variance on the "relaxation" scores provides no evidence of heterogeneity, since variance estimate between
= 2.45, error = 0.833, F=2.94 and exceeded. We cannot, therefore, properly doses
"relaxation" scores. There
is,
however, some clear indication that caffeine
does reduce "relaxation" scores, and
prove significant
We
if
a larger
DF) 
3.89, which is not go further with comparison of the
F. 05 (2,12
number
it
is
not unlikely that this effect would
of subjects were used.
therefore conclude that caffeine increases "alertness" scores as
to placebo,
and that there
is
a real dose effect. Caffeine
may reduce
compared
"relaxation"
scores, but this could not be established in the present experiment. There appears
to be
no
effect
on "nervousness"
scores.
1
ANSWERS TO PROBLEMS
204
P25.
The problem
is
solved by constructing the analysis of variance.
to code the data by subtracting 50.
Obs (Obs)*
10
Y R 20 G 21 Total

100
400 441
Decoding
Obs (Obs)*
R 12 G 29 Y
1
G 35 Y 12 R 8
144
1
16
39
256
1,521
121
convenient
It is
not required.
Obs (Obs)*
841
1
is
1,225
144
64
Total
33 21 12
1,089
441
144
66 T*
= 4,356
Colors
ir
21
441
2* IG
40
1,600
85
7,225
9,266
PRELIMINARY CALCULATIONS Type of
Total of
Number of
Total
Squares
Items
Observations per
Squared
Squared Item
Number of
Total of Squares per Observation
Grand
4,356
1
9
Rows
1,674
3
3
558
Columns
1,898
3
3
632.7
Colors
9,266
3
3
3,089
Observations
3,360
9
1
3,360
484
ANALYSIS OF VARIANCE SS
Source
Rows Columns Colors Error Total
 484 = 74  484 = 148.7  484 = 2,605 3,089 48.6 by difference =
DF
Variance Estimate
558
2
37
1.52
632.7
2
74.35
3.06
3,360
 484 = 2,876
2
2
1,302 24.3
53.6
—
N.S N.S *
1
Chapter 2
Table 7 shows at
2
2,
DF that F. 05 =
19.0
and
F
= 99.0.
i
and
differ significantly in their attractiveness to the birds
205
Therefore the colors
there are
no
significant
position effects in the experiment.
P26.
The problem really asks for an upper tolerance limit for observations on inbody temperatures in this illness. We are required to find a temperature so high that if it were exceeded, a "serious doubt'' would be raised that the observation came from the same population. The definition of "serious doubt" is somewhat arbitrary. Let us find that temperature below which 99 % of the observations would be expected to lie, and let us make our assertion at P = 0.05. For 15, Table 6 indicates K = 3.52. The data themselves are readily coded dividual
N=
by subtracting 100:
1.2,
0.6,
0.6,
1.8,
1.0,
1.2,
1.6,
1.0,
0.8,
2.0,
1.6,
1.6
2>c=
Q»
2.2,
0.4,
2>c2 =
18.4
0.8,
26 80 
2
»
22.57
N SS
4.23
—
52=
1.23
Xc
=
=0.302
14
x x
+ K(s) =
101.2
+
=
101.23
5
= 0.550
3.52(0.550) =103.2, the required upper limit.
P27.
This problem items
is
is
solved directly by the twosample rank
small enough to
Evidently there
is
make
it
practical to obtain
a preponderance of
X
U
test.
The number of
directly
by counting.
preceding C, so the smaller
V
will
probably be found by counting C preceding X. In the following diagram the is indicated. number of C preceding each
X
C c
c c X X X C C X C X C X C X X
).5
3
c X c
xxxcxxccxcxcxcxcccc 3 3
5
6.5
9
6.5
9
1
206 ANSWERS TO PROBLEMS
C
The sum of smaller
value
of
U
we
by
%
be 55 or
We next confirm that this is the = 123.5, which is larger than the 15, we find that at P = 0.05 the value
therefore 56.5.
is
U'
=
12(15)

56.5
Then consulting Table
obtained.
U must
X
preceding
U = NN' 
sample
less for these
sizes.
Therefore
we canot
quite assert,
significance level, that the supplemented diet
improved pelt quality. The nearly significant result, however, suggests it might be unwise to accept the null hypothesis unequivocally. The farmer would have to decide whether or not the matter was worth further (and perhaps more refined) experimentat the 5
ation.
P28.
Code
the data by subtracting 200, then dividing by 10.
No
decoding
will
be
necessary because addition and subtraction do not affect variances; and although multiplication ratio of
two
and division do
affect variances, they will not
20°
10°
II
5, 3,
III
0,
L
Total (Total)
change F, which
a
is
variances.
2
4 5
30°
(9) (8)
5,
3
(8)
0,
6,
8 (14)
(1)
7,
8 (15)
1, 9,
1
5 6
16
37
21
256
1,369
441
Total
(Total)
(0)
1
1
(6) (15)
1
1
2
r=o T2 =
PRELIMINAR Y CALCULATIONS Type of
Total of
Total
Squares
Number of
Number of
Total of
Items
Observations per
Squares per Observation
Squared
Grand Light
1+0+1=2 +
Temperature
256
Combinations
(9)2
Observations
(5)2 = 466
1,369
+
+
(8)2
+
(4) 2
1
18
3
6
3
6
344.3
2
446
1
466
0.333
441
= 2,066 +
Squared Item
•
•
+ ( 15)2
•
= 892
9
+
•
•
•
+
(6) 2
18
Chapter 2
207
ANALYSIS OF VARIANCE
Light
0.333

0.333
344
=
344.
2
Light \ temperature
by
= 101.4
4
Error
466
446=
466
Total
F
0.167
2
Temperature
diff.
Variance Estimate
DF
55
Source
20.0
9
= 466.0
17
content
may
be calculated.
P36.
153
250 220 81
306
244
I
1,254
L
We may
209
consider 209 to be a reasonably good estimate
deviations from this expectation by x 2

o\' A,
and examine the
ANSWERS TO PROBLEMS
216
o
OE
153
56
3,136
250
1,681
220
+ 41 + 11
81
128
16,384
306
+ 97 + 35
9,409
244
(O
 Ef
121
1,225
31,956
(O
Ef _ 2 (O  Ef _
At
P = 0.01 and
5
DF, Table
31,956
~E~
E
153
209
9 gives 15.1 as the critical value of x 2 So .
it
may
be
concluded that there is a marked heterogeneity in the distribution of particles over the electron micrograph field. In other words there has been local aggregais not at all what would be had been uniformly sprayed over the
tion or dispersion so that the overall distribution
expected
same
if
the
same number of
particles
area.
P37.
No
Hallucinations
Placebo
1
Treated Total
Since this special
2x2
Total
6
1
5
1
6
6
7
13
table has boxes containing expectations smaller than 5, the
method must be
same marginal
Hallucinations
totals,
Since the second table for both these tables,
used.
We
first
write the
most extreme table with the
then the next most extreme,
if
is
identical to the
one observed, we find the probabilities
the null hypothesis were true. For the observed table
have
7!6!6!7! 137lT6T5Tl!
=
7
.0245
22(13)
we
Chapter 3
and
for the
217
more extreme one, 7!6!6!7! 0.0006
13!0!7!6!0!
The sum of these, P = 0.025, hypothesis and conclude that
is
(24)(11)(13)
we
the probability
wish.
So we
reject the null
the extract did have hallucinogenic properties.
P38.
2x5
Let us rewrite the data as a standard
contingency table and apply the
2 x test.
A
B
C
D
E
Acceptable
14
12
20
13
17
76
Not acceptable
22
26
16
24
17
105
36
38
36
37
34
181
Total
Total
Expected Frequencies:
C
D
E
15.1
16.0
15.1
15.5
14.3
76
20.9
22.0
20.9
21.5
19.7
105
36
38
36
37
34
181
A
B
Total
EO:
D
A
B
1.1
4.0
4.9
2.5
— 2.7
1.1
4.0
4.9
2.5
2.7
C
E
Total
(E0)2:
A
B
C
D
E
1.21
16.0
24.0
6.25
7.29
1.21
16.0
24.0
6.25
7.29
(EOy A
B
C
D
E
0.08
1.00
1.59
0.40
0.51
0.06
0.73
1.15
0.29
0.37
^(EOV
x2
At P

0.05 with 4
hypothesis
any
may
=6.18
9 gives 9.49 as the critical value of x 1
we cannot conclude
between the flavoring agents.
C and
inferiority of
random sampling from able.
£
therefore not be rejected, and
real difference
superiority of
DF, Table
=Z
B may
In other

The
null
that there
is
words, the apparent
well be the result of the chances of
a population in which
all
the agents are equally accept
::
ANSWERS TO PROBLEMS
218
P39.
The important thing to recognize is that this problem calls for the twosample test, and not a x 2 test, because the data form an ordered contingency table.
rank
Data
4r,6