On the Distribution of Wald’s Classification Statistic

601 92 9MB

English Pages 144

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

On the Distribution of Wald’s Classification Statistic

Citation preview

P U R D U E UNIVERSITY

THIS IS TO CERTIFY THAT T H E THESIS P R E PARED U N D E R M Y SUPERVISION

Harman Leon Harter

BY

ENTITLED

OH THE DISTRIBUTION OF WALD'S CLASSIFICATION STATISTIC

COMPLIES WITH T H E UNIVERSITY REGULATIONS O N GRADUATION THESES

A N D IS APPROVED BY M E AS FULFILLING THIS PART O F T H E REQUIREMENTS

F O R THE D E G R E E OF

Doctor of Philosophy

P

H

r o f e s s o r in

ead o f

S

Charge

chool or

D

of

Th

e s is

epartm ent

TO T H E LIBRARIAN:-THIS THESIS IS N O T TO B E R E G A R D E D AS CONFIDENTIAL.

PROFESSOR UT CHARGE

GRAD, SCHOOL FORM

8 —3 - 4 S—1M

ON THE DISTRIBUTION OF WALD’S CLASSIFICATION STATISTIC

A Thesis

Submitted to the Faculty

of

Purdue University

byHannan Leon Harter

In Partial Fulfillment of the

Requirements for the Degree

of

Doctor of Philosophy

August, 1949

ProQuest Number: 27712254

All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is d e p e n d e n t upon the quality of the copy subm itted. In the unlikely e v e n t that the a u thor did not send a c o m p le te m anuscript and there are missing pages, these will be noted. Also, if m aterial had to be rem oved, a n o te will ind ica te the deletion.

uest ProQuest 27712254 Published by ProQuest LLC (2019). C opyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106 - 1346

ACKNOWLEDGEMENTS

The writer wishes to thank Professor Carl F. Kossack for his help­ ful suggestions and patient guidance in making this work possible.

He

also extends sincere thanks to the Office of Naval Research, whose re­ search grant through the Purdue Research Foundation enabled him to carry on the research.

Thanks are also due to Dr. S. E. Wirt, who spent many

hours calculating the experimental results, using IBM equipment, and to Professor Irving ¥. Burr, who offered valuable suggestions in regard to the sampling experiment.

TABLE OF CONTENTS Page ABSTRACT........................................................

i

I. INTRODUCTION ................................................

1

A. The Problem of Classification

.....................

1

B. The Wald. Statistic........................................

4

II. THE EXACT DISTRIBUTION OF V FOR p = 1 ..........................

7

A. The Degenerate Case:

* 0,

= 0........................

B. The Non-degenerate C a s e . . . . ..........................

7 • 16

1. Case 1: n e v e n ........................................... 21 2. Case 2: n odd............................................. 23 C. The Use of the Statistic V in Classification.................. 24 III. APPROXIMATE DISTRIBUTIONS OF V IN VARIOUS CASES................... 30 A. Waldos Approximation nny..................................... 30 B. The Degenerate Case: 1. 2. 3. 4. 5.

Case la: Case lb: Case 2a: Case 2b: Numerical

= 0, ^

=0

. 33

n even, p o d d ......... 39 n even, p even. . .............................. 50 n odd, p e v e n .................. 61 n odd, p odd..................................... 67 Illustrations of the Distribution of m^. . . . . ♦ 67

C. The Non-degenerate Case:

= C,

/ 0 ....................

D. A Better Approximation for Small Values

82

of n ................. 102

IV. Ail EMPIRICAL DISTRIBUTION OF V FOR n = 10,p = 3,

e± = X-

= 0. . . 104

A. The Sampling Experiment.................................... 1C4 B. Tests of Randomness of Sampling . . . . .

..................

108

C. The Calculation of the Statistic V ........................... 110

TABLL OF CCMTLHT5 (Continued)

Page D* The Empirical Distribution.................................. 117 E. Comparison 7fith The Theoretical Approximation............

. 117

V. CONCLUSIONS AND SECCllSNDATIONS................................ 124 VI. BIBLICGIUFHY................................................. 126

A

TABLES

Page

Table 1.

Values

and Parametersof theFunction 0

for n * 4, 6,

S, 10; fi = % i = 0 a. Values 69

b. Parameters................................... . 2.

Values

and Parametersof theFunction Cn

for n = 4, 6,

S, 1C; f i = % i = 0 a. Values b. Parameters. . . .........................................

3.

Probabilities that .nru lies in Certain IntervsJLs for n = 1C, p ~ 3 and 2,

4.

Values

= 0. . * . . . . « . . . . . .

of the Functions G

ei-ti-o 5.

72

-

_(m ) for n =

4, 5, 6, p = 2and 3» / ....... ..................... 79

Values of the Functions G,r ^(nv.) for p. =

3

* * « . » 75

P

i

X

= 0 and for

100 6.

Frequencies for Normal Population of 10,000 Numbers Stamped cn Beads. . . . ....................................... 106

7.

Observed and Theoretical Frequencies for 30,000 Sample Values from 10,000 Bead Normal Population............ .

109

2 b, x -test of Observed vs.Theoretical Frequencies for 3^,000 Sample Values from 10,000 Bead Normal Population. . . . . . . 111 9. Frequency Distribution of 1C00 Empirical Values of V for n = 10, p - 3 (Class Limits Integers).......... .. . . . . 118 10. Frequency Distribution of 1000 Empirical Values of V for n = 10, p = 3 (Class narks Integers). . . . . . .................. 119 11. Observed Frequencies of V and Theoretical Frequencies Calculated from Revised Approximation V =• au AuScs»!-......... 122 7 3

FIGURES

Figure

Page

1.

Domain of the Variables

2.

Graph of the Function G (m )for p = 3 and n = 4, 6, 8, 10; Pi = 0 . . .. P ? .3 ................................. 70

3.

Graph of the Function G ( r r u ) for p = 2 and n = 4, 6, 8, 10; ? i - t i - 0......................... 3 ...................................................................... 73

4.

Graph of the Function Gn oCny) for n = 4, 5, 6,

5.

Graph of the Function Gr^(m0) for n =4,

6.

Graph of the Function G^q -^(ny) for ^ ^i “ °j i ^ (^i “ ^

7.

and

..................... 36

5> 6,

^^ = 0. . = ^

= 0. .

80 81

=0, Yp ™ 0 and for

.................................... 101

Histogram Showing the Empirical Distribution of 1000 Values of V for n =10, p -3 ................................

120

ABSTRACT In many educational and industrial problems it is necessary to classify persons or objects into one of two categories— those fit and those unfit for a particular purpose.

In formulating this problem of

classification, Wald assumed that for p tests we know the scores of individuals known to belong to population lL^ and of

individuals known

to belong to population El^, along with those of the individual under consideration, a member of the population n, where it is known a priori that n is identical with either 11-^ or n^. individual into either

or

In order to classify this

Wald introduced the statistic V defined

by the relation

(1)

V =

S

S

ti

jtj

2

(n =

+ N2 - 2)

1=1 j=l where

(2)

lU1'3!! =

'1> sij = z tia tja / n a=l

and where the variâtes t ^ (i=l, **•, p; @ =1, •••, n + 2) are normally and independently distributed with unit variance and with expected values (3)

E(tlw) = 0

where p

and

n); E(ti>n+1) - Pi; E(i1>nt2) - Yi

are constants.

The exact distribution of V for p = 1 has been obtained by the use of characteristic functions, the Levy inversion formula, and contour in­ tegration, for both the degenerate case P]_ = ^

= 0 and the non-degenerate

ii case

= 0,

^ 0.

In each case integration around the right half­

plane gives the probability law P(|V|) = 2 P(V) as an infinite series which converges for |VI >4v/2.

(4)

p(ivl) = — 1— n r(? )

For

= 0, we have

L w * (-1 )v"n/2 fc ifz lL lllf ( v - 5 + l)

v=n/2

where v takes integral values for n even, half-integral values for n odd. For ^

= 0,

^ 0, we have, for even values of n

(5) P( ivl ) - — I

[ £

|v|-V

nr(S)[v=n/2

CV£2/ v) 2"3^13 r(v - ’*[ + 1)

r (▼ - 23 ♦ l\ +

2

V

®

s=0 \2s'

Wl~^2u +

- iH--U j *.

r(2u * 2 - 1)

u-Cn/4]

2u

2* 1(-i)n*r =-r/2( r ) ? ' ^

M

r=0 where [v/2] represents the largest integer contained in v/2 and [n/4] is similarly defined. Let us represent the known test scores on p tests of a random sample of

individuals from population

those of a random sample of

by

(1=1, "

individuals from

p; a- i, **•, M^), by y ^ (i=l, *••,p;

(3=1,***, N^), and those of the individual under consideration, a member of population 11, by hypothesis

II 5

(i=l,•••, p).

We have available for testing the

against the alternative hypothesis

two values of V, as follows: (6)

V1 = VlH,

- J

J

Slj (Zl

-

(ÿj -

11 =

iii

(7)

V2 “ V|H2 = iSl j£l slJ (Zl " yi) (5Fj " ^ V n 1

W1 _ wbere x. = ag1 x ^ / N ^ y.

n2

n

y ^ / N ^ and

♦ N 2

it

=

||s..|| -1 where

Si- are the pooled sample covariances

(8)

Np ^2 2 Ufc ' x j (xja - X.) + 2 (y - ÿ1) (yj!

(M>

that is, the variâtes are normally and independently distributed with unit variance, and the mean of t. (a - 1, ..., n) is zero, the mean of t. _ ' ia x 3 ’ 7 i,n+l is P ,. and the mean of t, is 7. . 1i i,n+2 Thus, in the case p = 1,

(2.4)

V=

~ -1

a=l

tl./n

5 Wald, op. cit., p. 150, equation (25) 6 Wald, op. cit., p. 152, equation (34)

8

is the product of two normally and independently distributed variâtes with unit variance and means

and'î 1 respectively, divided by the mean square

of n normally and independently distributed variâtes with zero mean and unit variance. In order to simplify our notation, let us set x = t^ and Z =

2 § t? J la^/n. a=l

y = t^ n+2

Then we have

(2.5)

V = 2K

In the case under consideration (that is,

= 0), x and y are

normally distributed with zero means, so that their probability laws are

(2.6)

P(x) =

e

^

; P(y) =

V2n

e ~ ^



V2n

Because of symmetry we have then

(2.7)

P( |xl ) =

e

- x2 2 ; P( |y| ) =

V2n

e

_ Z2 2 .

V2n

n 2 # p, 2 t-, /n is distributed as x /n with n degrees a=l a of freedom, that is, the probability law for Z is

It is well known that Z =

n—2 ^

(2.8)

P(Z) =

SL— r(!)

7

£

2

-wZ " 2

e 2V 2

Now we proceed to find the probability law of V

= —^ ^

in a

manner similar to that used by Shrivastava2. Let w = In |V| = In |x| + In lyl - In Z.

Then the characteristic function of w is given by

(CO zoo f03 ilJ+ 0

Next we transform the integrands so as to express the integrals in terms of Gamma functions, as follows :

(2.12)

/CO . v /QO J y e dy = ( x 'o 'o

x* /CO e~ 2 dx = f x Jo

x*” e i (x dx) .

7 Shrivastava, M. P., If0n the D -Statistic," Bulletin of the Calcutta Mathematical Society, Vol. 33 (1911), pp. 71-86.

dZ

p Expressing the integrand as a function of

, we find

it-l/rn/ (2 .13)

[

y“ e - f a y =

Üzi.

x“ e-^lx =

e" » d(

The integral can be expressed as a Gamma function, so that we have

(2 .1 4 )

f /o

e- f dy = (33 xit e- ÿ ^ = 2~-f r (i|ti) 'o



A similar transformation of the third integrand in (2.11) gives

(2.15)

jo z ^

e -^dZ - 2 p t j o

e-^dd) .

This integral can also be expressed as a Gamma function, yielding

(2.16)

(

Z^

-lt e'^dZ = 2

r(a -it) .

Substituting the values obtained in (2.14) and (2.16) for the integrals in equation (2.11), we obtain

(2 -17)

t ( t ) - - 2^

Simplification gives

n

[ 2 r ( ^ l 2 . 2



11

(2-18)

*(t) = 5 - 2 ^ V(| -it) [r

(i^j

\ 2-

By the use of the Levy inversion theorem

(2.19)

U.

P(w) «

e*iwt $(t) dt

we obtain, by substituting the result found in (2.18),

(2 .20)

P(w) - --- 1

an2

r(§)

( ^ e " iwtr(8 -it) f r f V - h l 2 dt a \ 2 /j

'-n

Making the substitution v = it, we have dt = -i dv, and

(2-a>

■ - v t ( î ) G ' " " r < °■ *) [r (i r i)]2 * -

In order to evaluate this integral it will be necessary to resort to contour integration. Using a property of the Gamma function given by Whittaker and Watson^

(2.22)

r(Z)f(l - Z) = — — sinllZ

8 Whittaker and Watson, Modern Analysis, Fourth Edition, Cambridge University Press, London, 1940, p. 239.

12

and letting Z = ^ -

(2.23)

we obtain

n

r ( | - v) -

r (v - S + i) Sin n(| - v)

Substituting this value o f P - v) in (2.21), we obtain

(2.24)

P(w) - — 1 - . -inp(9) 2ni

P

e‘™

dv sin n(§ - v)f(v - g + 1)

Dividing numerator and denominator by 11, we find

(2.25)

=4 ) " i l

2 +l£0 -w, [r(-^-)l dv. r -ioo sin n(^ - v)P(v - 5 + 1)

We shall now perform a contour integration, using as the contour the imaginary axis plus the semicircle in the right half-plane with cen­ ter at the origin and infinite radius. First we shall show that for a certain range of values of w, the integral along the semi-circular por­ tion of the contour is zero.

Using another relation given by Whittaker

9 and Wat son

(2.26)

9 Op. cit., p. 240

r (2x) = — — r u) p (x + i) yn ^

13 and setting 2x = ^ - v, we obtain

( 2. 27)

r (^ - v) - ..21^ — p ( ^ -

-

Vn

ÿ

By application of equation (2.22), we obtain

(2.28)

P (* - |) = r(4r + z) sin n(= - =)

and

(2.29)

P( ^ V - ?) - _.

. .n.

sin

Substituting these results in equation (2.27), we obtain

_ 2 ' " ^ ______________________ l£. (2.30) P(» - v) = VTÏ" r (^ = + |) P ( - ^ + r) sin n(= - f)sin n(2^ Simplifying, we have

(2 .3 1 )

r ( ? - v) =

2"

nv^"

r (iÿ i t p r(4 r- + f)"2 sin n(s - ?)oos n(2 - «) Using the relation sin 20 = 2 sin © cos 0, we find

(2 .3 2 )

r(a - v) - ------------ Ü -- l l Ü -----r ( ^ + p r ( ^ i + p-sin n (| - v)

. q

14 Substituting this result in turn in equation (2.21), we have

(2 .3 3 )

p(w ) =

,.J L

rep

(

— [r(i * p

]2

2ni ; ^ P («ÿs +

V n i à c l V f dv;

+p

sln

. v)

Now let C be the semicircle of radius N + - on the right of the 2 imaginary axis with center at the origin (N an integer). Along C the integrand in the equation (2.33) is

*

(2.34)

,



■w.

0(N~ ^

2-

/

—W / - \ V

V*

9“ ) Le„/2)..^ . dv sin n(=" - v)

as can be shown by using Stirling*s approximation to the Gamma function, r(x-fj)= e X xx \/2nx . Therefore as N-»co,

according to Whittaker

and Watson^, provided

(2*35)

lue~W 1 _ ... I mv I 2 I < li that is

< 1

But ew = |v|, so (2.35) yields

(2.36)

'VL 2V

< 1, so that

|2vj>'Vt, or jvj >n/2 .

Thus we have shown that for |vl >lt/2, the integral around the semi­ circular portion of the contour is zero.

10 Op. cit., p. 287.

Hence, under these conditions,

15 the integral on the right hand side of (2 .2 1 ) is equal to (-2111 : times the sum of the residues at all the singular points in the right half­ plane.

The integrand has simple poles at v = n/2, n/2 + 1, n/2 +2, .

and no other singularities in the right half-plane. Hence

(2.37)

oo P(w) = — '— (-2111) I Residues f(?) 201 v=n/2

.-^ [ r (^ )]2/r(v - ^ 1) sin i - v)

Inserting the actual values of the residues, (2.37) becomes

(2.38)

P(W) . - J L ? e- ™ < D O = h f Z r i v ^ m l L • r v=n/2 -n cos n(Ç - v)

Using the fact that cos k II = (-1)^, for k an integer, we find

'Vv/2 .

16

Since the values of x and y are symmetric about zero and uncorrelated, the values of V are also symmetric about zero, and hence P(V) = ip(lvj). To obtain a series for P(lvj) which converges when (V| (t) “ i l l e ■'o 'o 'o

P( lx|)p( )y] )P(Z) dx dy dZ .

Substituting the same values of P(lxl) and P(Z) as before, and the value of P(ly| ) obtained in (2.42), we find

x

-

(2.44) (t)

7^ c0s 'O : s :'o< v > “ -Y2H

e

e" 1

a

+e

V2n dx dy dZ .

The distributions of x, y, and Z are independent of each other, so

Jo

f xlfc e- 5 dx)” (2 .45) 4>(t) = "gvf — 2 z n r x p >0

L

J

r ^ - i t e - i a z /o

The second integral on the right-hand side of equation (2.45) can be expressed in terms of Gamma functions and Incomplete Gamma functions. Let us assume that

(2

it f

•46)C y i t [

> 0.

Then

+e

This can be written in the form

]dyCylt

a

dy +1

y^ e

dy.

18

(2.47)

j

y^je-

+ e~

J dy = Y1 + T2

where

(2.48)

Y1 = £ yit e"

Let us evaluate

dy ; Y?_ = ^ y " e"

and Y^ in turn.

To evaluate Y^, let y /= y - ^ y = 0 —»y ' = -

(2.49)

dy.

; then y = y Z +

dy = dy', and

y = co— »y/ = oo, so that

Yn = (T (yz + /-?1

e

dy' .

Expanding by the binomial theorem, we find

(2.50)

Y = f ■L

? (^y'it-r % r e- < 2 d / . r=0

Reversing the order of integration and summation and splitting the integral into two parts, we have

(2.51)

Yx =

Z ( r l v f n y/it r e' r=0 v-J-,

^

dy + *o f y,lt r e~ ^

dy,l J

In the first of these two integrals, let y ' = -y", then dy Z = -dy Z/ ,

19

and y

— —

(2.52)

>y

^

=

=

y

= 0 —> y

? ( y ) ^ f(-l)lt-r^

— 05 so that we have

yz/lt"r e" ^

d y ^ ^

e' ^

The first integral in (2.52) can be expressed in terms of an Incomplete Gamma function and the second in terms of a Gamma function, giving us

\

■ J

[ > - » lt"r r ? 1

To evaluate Y^, let y y = 0 —*y

y

=

w

-y + ^

hp)

*res=i)] ■

then y = y ^

dy = dy', and

/ y = oo—»y - co, so that we have

(2.54)

Y

= f (y'-T)^ e ^ Jt 1

dy' -

Expanding by the binomial theorem, we find

(2-55)

Y ^

= (” 2 ) \ r-'J

(-I)1" M y/lt-r TT e* V ' * 1

dy'.

Reversing the order of integration and summation, we have

(2.56)

ï2 =

2 (-Dr ( t y l Ç

7'Xt'r e" ^

Writing the integrand as a function of (■=&-), we obtain

dy'.

dy

20

«



»

>

.GO ,03 /Kl Writing I as I - 1 , and using Gamma and Incomplete Gamma functions, ;o '0 we find

(2.58)

r2 - J

(-If

2 ^ [ r ( ^ )

- fj, ( ^ P ) ]

.

Substituting the results obtained in (2.53) and (2.58) in (2.47), and that result, together with those previously obtained in (2 .1 3 ) and (2 .1 6 ), in turn in (2.45), we obtain

(2.59)

(t) = — y-g- L - • 2a / 2n r ( j )

. 2 l _ ^ T ( T - it) v

2

/

CO E r=0

r4 ( M ^ ) ]



Simplifying, we find

(2-6°) *(t) = 22 %

-it) J

2 - t ( $ ç f c H - i ) r} r ( ^ ) +

{(-D^-C-D^r^^îi)]

21

Again using the Levy inversion formula, (2.19)> we find

' ■

£

• - " r ^ r g -

= - 1 rreyffw-iii

r( “ =|îi)+ {(-Dit--r-(-i)r}r?i (ik|îi)] d t .

Letting v = it, dt = -i dv, and we have

(2.62)

P(w) , _ = 1 --- (

- v)^

4irir(v)

Z

r=0

p i+ ( _ i) r ) r ( z = c i) +

M2" 5Y ;

( ( - D v- r - ( - D r j r ^ ( i =

1

0

dv .

This integral may be evaluated by integrating around the same contour as in the degenerate case.

Two cases arise:

Case 1: n even. Since the singularities in the right half-plane all occur for positive integral values o f v ( v = | , | + l , | + 2, ""), and since

vanishes for v a positive integer and r > v, the upper limit of

the summation on r in (2 .6 2 ) may be changed, for even values of n, from CO to v. We have then

(2.63)

P(w) = — - i ( ™ e-vwr(2Sl)rf2-v)V' Ë M z ' T f w f i f (2-) Uja) ' 2 ' '2 ' r=0' '

[ji+(-Dr} r(5=ni)+ j(-Dv-r

dv

22

Integrating around the same contour as in the degenerate case, we find

(2.64)

P(w) = --- — - . —i— (-21Ii) Z 2H p (^) 2IH

(Residues in right half-plane)

Inserting the actual values of the residues, we obtain

L.

(2.65) p(w)

Z e~w - h - f r ?

2RP(~)

v=n/2

f

P(v - ^ + l)

r=0

Making use of the facts

(2.66)

1 + (“l)r =

2 for r even for r odd.

j0

and

r0 for v even, r even

(2 .6?)

(-DV~r -(-Dr

0 for v even, r odd -2 for v odd, r even ^2 for v odd, r odd

we can write our result (2 .6 5 ) in the form

1

(2.68) P(w) =

” e-w 2 Ü - R 2 R — (-l)v-f [I] (,")2*s 72^fcd£!l) P(v - ~ + l) F-0 1 ' 2 /

nr(2 ) v=n/22 +

.♦D«

? e-20 j^ = %^ = 0 ] = .0003S, while p||v| >2ô|f-^ = 0, ^

= l| =.00053..

It can be readily

seen from (2.69) that as | i n c r e a s e s , the probability that Jvj exceeds a fixed value also increases, while for ^ = 0, (2.69) reduces to (2.40). Case 2:

n odd. Since the singularities in the right half-plane

occur for half-integral values when n is odd, and since

does not

vanish for half-integral values of v, even when r > v, the upper limit of the summation on r in (2.62) is actually infinite.

Another difficulty

in this case lies in the fact that, since v is half-integral and r is an integer, v - r is half-integral, and hence (-l)V-r is imaginary.

The

2k

result in this case has not been worked out. Even though it is not necessary for classification, as will be shown in the next section, it would be desirable for the sake of complete­ ness to have also the distribution of V for the case

/ 0, ^

^ 0.

How­

ever the method used above is not applicable since it depends upon the symmetry of V about zero, and V in this case has mean not even symmetric about that mean.

^

Hence the case

/ 0 and is

^ 0, ^

# 0 will

not be considered here.

C. The Use of the Statistic V in Classification

Let us recall that we have two sets of p variâtes (x^, (yj_,

", yp) independent of each other and each having a p-variate nor­

mal distribution.

The mean value of

is denoted by

and that of

by vl, while the common covariance matrix is denoted by normal population with mean values \\ OjjH

hg.

|(o\j|( . The

•••,/'£ and covariance matrix

is denoted by 11^, and the normal population with mean values and covariance matrix ))o^j || by

size

Xp) and

We draw a sample of

from the population 11-^ and a sample of size N

from, the population

The a-th observation on x^(i =1, •••, p; a = 1,

by x^, and the ^-th observation on y^(i = 1 , We make a single observation Z^(i =1,

is denoted 1,

, N^) by y^.

**, p) on the i-th variate from

a p-variate population II, where it is known a priori that II is identical with either

and

Assuming that the

are distributed independently

25

of the

and the yn, and that the parameters

and Ü

|| are unknown, we wish to use the information at hand to test

the hypothesis

:H =

", Aipi

against the alternative hypothesis

***,

: II = 11^

11 Wald considered the statistic

(2.70)

U

/

=

P P ii / 2 Z s1 -1Z. Z, 1=1 j=l J

where

(y1 - x j

(2.71)

= z, .

(j=l,

p)

V N1+N?

He remarked that the distributions of U

under the hypotheses

and

are contained as special cases in the distribution of the statistic

(2.72)

V =

JJ

sij tijn+1 t .>n+2 .

(n=N1+N2 - 2)

Here s1*^, which occurs in both (2.70) and (2.72), is defined by the rela­ tion lls i J ll =

l | s ^ j ||

N1 (2.73) s± . = 13

11 Op. cit., p. 150.

where N0

(y^-ÿ^-ÿp N-l + N2 - 2

26

We have also, regardless of the hypothesis being considered

Hence in the statistic V only consideration.

n+^ depends upon the hypothesis under

If we are to make use of the distribution of V for

P. = 0. Ï. 4 0, which we have worked out for the univariate case in xi ' i section B of this chapter, we must have E(tj^n+-^) = 0.

Hence, under the

hypothesis H^, we must have

(2.75)

*1,11+1 = Zi "

while under the hypothesis

(2.76)

(Since

“ Zi “ 51

we must have

ti.n+l = Zi - V 1 â Zi "

and

are in practice unknovm, they are replaced by their

optimum estimates hypothesis

and y^, respectively.) Thus, for testing the

against the alternative H . we have two values of the

statistic V, as follows :

(2.77)

V

= (V | H ) = Z I siJ (2. - 5L)(ÿ - x.) Ü 2 1 j i J 3 i N1 +N2

(2'78)

V

= (V | H ) = L E s ^ (Z - y )(y - x ) ji 1 1 J J

NltN 2

27

There are two types of error associated with our choice between the hypotheses

and

The error of rejecting

(accepting Hg) when

is true is commonly known as an error of the first kind and the error of accepting

(rejecting H ) when

error of the second kind.

is true is commonly known as an

Suppose in our classification problem that we

consider an error of the second kind to be k times as costly as an error of the first kind (where k may

begreater than, equal to, or less than

one). In order to balance the cost of the two types of error, we shall then accept

(2.79)

Prob( IvblvJ ) £ k Prob( |V|ÿ|V2| )

but accept

(2.80)

(reject Hg) whenever

(reject H^) whenever

Probdvl^lv^ ) < k Prob(

).

The probabilities for the two values of V in the univariate case can be obtained from the distribution as given in Section B, using

ofV for the value ofn involved and p=l =0 and

N1 M2

(^j is by definition the expected value of tj n+2 > assuming that the latter is expressed in standard units). In the special case of equal weights (k=l), the inequalities (2 .7 9 ) and (2.80) take somewhat simpler form.

In this case, we shall accept

(reject H^) when

Iv^l # but

28

accept H2 (reject H^) when IvJ > IVgl, since V is distributed symmet­ rically about zero and the distribution function J( |v | ) is a monotonically decreasing function of |v|. Let us consider a numerical example for n = 10, p = 1. have sample values for

= 6 and

Suppose we

= 6 individuals from the populations

H1 and Hg respectively and for one individual from the population IT, = 56 ,

known to be identical with either 11^ or IT^, as follows:

= 42,

X3 = 59, X4 = 4 1 , x5 = 4-0, x6 = 62 j y-L = 69 , y2 = 73, y^ = 6 0 , y^ = 73, y^ = 80, y^ = 6 5 ; and finally Z = 58. s = 75, ./ -----

=

V %+n2 (2.62)

J~3 and

hence Z. =

1

= 4*

= 3 2 Ü = 3. 695

75

S

T- =

Thus we have

/vi

J & j L mS l 2 Q m

v . ^

(2.83)

These data give x = 50, y = 70,



15

-12,(2 0 /1 )

s

=

. _5-543

75

5

If we assign equal weights to errors of the first and second kinds, we should accept hypothesis

without the necessity of calculating proba­

bilities, since Iv^l = 3-695 is less than

= 5-543»

This agrees

with the obvious classification, since Z is closer to x than to

f.

But

if an error of the second kind is k times as costly as an error of the first kind, we should accept holds.

if (2-79) holds, but accept

if (2.80)

The probabilities involved in (2.79) and (2,80) can be determined

29

from (2.69), using n = 10 and^ = 4-

In actual practice, however, the

amount of labor involving in computing probabilities from (2 .6 9 ) is pro­ hibitive.

In order to make (2 .6 9 ) useful, a simple approximation should

be found and/or the probabilities tabulated. In case the populations lem of classification.

and IT^ are identical, there is no prob­

The statistic V for the degenerate case

= 0,

= 0) can then be used to determine whether it is reasonable to assume that Zj/i =1, population

*, p) is a random sample from the p-variate normal = Th,, though other methods are more feasible.

30

III. APPROXIMATE DISTRIBUTIONS OF V IN VARIOUS CASES

A. Wald’s Approximation n ny

12 Wald has shown that the distribution of the statistic V is the same as that of the statistic

(3.1)

7 = -n

^ -(1 — ny)(l — m^)

where the joint distribution of ny, m th e p ro d u c t

of

and m

2

is equal to a constant

3

multiple ofAthe exponential expression P

(3 . 2 )

2

P

f i + 2*3

w

P V2 s

^

.j

)

the expected value of

(n+2-p)/2 11

lp

Pi

PP

(3.3)

(rij ’ a=l ^

and the expression

12 Op. cit., pp. 161, 162.

31

(3.4)

Fp(mi) Fp(mg) U

Fn+2-p(1_m2^ W p ( 7 —

yi

— 5z-) Fn+2_p (1-^)

=^===f=)

(l-Dig)

^2 ^

J

where B denotes a constant, and where

(3.5)

F (t) = -rrjï(t)(k"2)/2 k 2k/2r(t)

e"t/2; t (t) - -- !liiL_(i-t2)(k-3)/2. k n r(4i-)

The expected value of (3*3) is calculated under the assumption that the variâtes t. are normally and independently distributed with unit vari­ ances and E(t. ) = o . u ICC

^1



+5. v (i=l, •••, p; a=l, ••

P 2 P Z v = m0 and Z u v a=l a*l a

j.

CC

= 3

p), where

Zu = mn, -i CC -L a=l

. The domain of the variables m, , nu and 1 ^

m,[j is given by the inequalities 0 0 .

The first of these holds throughout the domain defined above, but the second does not, and hence introduces further restrictions, in addition to those given by (3 .11).

The domain of integration, as shown in Fig. 1,

p is, therefore, bounded below by the surface ny ny - ny = 0 and above by 2 the surface (1 - ny)(l - ny) - ny = 0.

Hence, in integrating out ny, the 4-

*

range of values for ny is given by the inequalities The bounding surfaces ny sect in the plane ny + plane is the circle (ny =

p

4 ny 4 1 -

= 0 and (1 - ny)(l-ny) -

.

= 0 inter­

= 1 in an ellipse whose projection on the ny ny o - m^ + m^ = 0 , which has its center at the point

= G) and radius ~ . Hence, for any value of m^,

is re­

stricted to values for which the point (m , m ) lies inside this circle, ^ 3 that is, the range of values for ny is given by the inequalities 1 -Vl—Imp ^ ^ 1 W 1—/imp -a- 4 m 4 J— . One can readily see that real values of m occur only for |m | < 4 . ^ 3 2 When p = 1, one can show, by applying certain geometric considerations to definitions given by Wald, that the expression [ny ny - m^j takes the value zero.

Thus the factor jny

- m^j”"**, which occurs in the joint

distribution (3.7) when p = 1, becomes infinite, and hence our approxi­ mation fails for p = 1.

This is not, however, a serious restriction,

since the exact distribution of V for the case p = 1 has already been

DOMAIN of

the

VARIABLES

m,

>

m 2

,

a n d

3b

37

determined in Chapter II. We shall therefore restrict ourselves to values of p greater than one, and divide the problem into the following cases (we shall consider only cases for which n is greater than or equal to p):

Case la. n even, p odd:

that is, both exponents in (3*7) non-negative

integers. Case lb. n even, p even: Case 2a. n odd, p even:

that is, both exponents half-integral. that is, the exponent

integer, but the exponent Case 2b. n odd, p odd:

a non-negative

half-integral.

that is, the exponent ^2. a non-negative integer,

but the exponent

half-integral.

Before considering these cases in order, let us pause to work out one interesting and useful property of G(m^).

We have from (3*7) the

relation

^ Brl

O - U ) Gn,n+2-ph) ' 4

I

2j 2 ) [(!-= )(1—m0 2 -m3 J

^

) l - V l - J 'T'H,

n-l-p 2

k —

/

m2

dm^ dm_ .

-

, m2 = 1 - m2 ,

Jacobian ^inl

j W > m2) (uLl , m2)

anil jm2'

^2 d

am2 3m2'

-1

0

0

-1

= 1

38 /

/

so that d%i]_ dm^ = dtn^ dm^, and after making the required change of limits, we obtain J

Z m2 n-l-P 2

= C

[(l"m2_)(l~m2)-ni^j i+

y

- 5 - 1

^ Ez2

2 k

m2

Interchanging upper and lower limits for both

(3-15) Gn,„.2-p^)

= c

r ■ ? ,1+1Il-lmq 2 l->[ l-Wq 2

,/ , / dmn dm 1 2

' hi and

gives

2

n-l-p 2

1- JÜ31-m

A

/

m2

[m' m2 But the right side of (3.15) is simply G n iP

mi\

2

/

dm^ dm

/

2

(m_), so we have J

(3 .16) Gn!n+2-p(m3 ) - Gn,p(m3) ‘

This relation, which we shall later have occasion to use, establishes equality between any Case 2a value and some Case 2b value [for example. G

(m ) = G (m )], also between any Case la value and some other 2;4 J 5,3 3

39

Case la value [for example,

= G^^(ny)] and between any Case lb

value and another Case lb value [for example, Gg

) = Gg

)].

Let us now return to considering the four cases listed above. Case la. n even, £ odd. Let p = 3 + 2c, where c = an integer ^ 0. Substituting this value in (3*7), the distribution of ny can then be expressed as a double integral, as follows:

{l-jfiâ

m?

n I

Zl- I ^ (3‘17)

r

2l ~

-C

^(l-ny)(l-m^)-m^J

Gn,3.2c(h ) = = \/1—4m^ 2

m2

“2 " h i ° dmi ^ 2

K

or, writing the expressions in brackets in slightly different form.

r 2 1+ < Il-Lm^

2 0n,3*2c j. ^™2



3

*

tVr+c+k-

dm ' + f

2

"-q

.3

( ^



^r+c+k- '~ï/ ) ^2

/ Performing the indicated integrations with respect to m , we find



’>

" ( ? )

- r “ * ¥

tVr+c+k- — I

+

{rC2^)

^

46

Substituting this result, with

*«-3> •

replaced by (l-m^), in (3 .28), we have

2(k+q+r) =-q S 1"’ P \ ~r ' ‘i / 3 r=-~-c-k .t=0 t^r+c+k- -T-

^

(?)

°'q \r+c+k-

)

f

In(l-m^) 1

Evaluating, we obtain

1

(3.41)

■j-q Ivÿ. -j-q) -j-q \ 2(k+q+rj B(m ) = *2 I / C"q -M-a.\Jn l-'Jl-Wat I-- -y Ir+c+k 3 r4 -c-k' r' / " l+/l-4m: — —k-q -r—t c-q

+

£

t=0 tVr+c+k-

(-D

t-r-c-k+

(c-q\ 2 ( i ^ 3 ) ' tz/ 2t-2r-2c-2k+n-2 I n 2 / / ./1

,~^£— —k—q—r—t

y.

We have also

47

and iWf-V'Vl’

!±£55i dm2 (3 .4 3 )

nu tIEtzt

= In ra.

= - In

■E

Substituting these results in (3.29) and (3.30) respectively, we find

2\j-k-s

-c-k (3-44) C(m3 )

Z s=0 s#j-k

“c“k J

(-1)

1 n-2+2k-2j j-k-s m3

1+ 4 :

2

/

j-k-s

and

(3.45) D(m ) - (-l)j-k+1 (^- -°"k) m"-2+2k-2j ^ \ j-k / J

in i l S l+Vl-W^

Thus the distribution function G (m ) is given by n,3 +2 c 3

(3-46) Gn,3+2ct™3)

c% j=0

'C

■°' \ ^

=0 ^ ) q=o t-1) /k=0

D

[A(m3) + B(m3) - C(m ) - DCm^)]

where

n-2-2j-.2q

48

m in (^ i -j-q , - V -c-k) A -, _._q\ 2(ktq+r) = ? _ -c -W

\ r JT

Z

(3.47) A(m3 )

r=0

z t=0

^ (-i)

-k-q-r-t -c-k-r ’)

2 n-2-2k-2q-2r-2t

J

2

/

2(k+q+r^) ( / , c-q \ In 1-Æ%Anu m. \r+c+k ^ 1 + ^ Am?

(3.48) B(m ) r=

1

\

-c-k

-k -

c-q t-r-c-k+ 2 (-1) t=0 t^r+c+k-

1V t / 2t-2r-2c-2k+n-2 Pt-Pr-Pc-

+

--— —k—q—r—t |l+Vl-Ami

-|X /. / p ——— —k—q—r—t (1-^4?' * f

‘) j-k-s n-2+2k-2j (3.49) C(m3)

M

5=0 s^d-k

i

i

:

s m3

2\ j-lc*s

(3 .50) D(m3 ) = (-l)j-ktl M -

J

V

-C"k) m3n-2+2k-2j In k

j-k

I 3

S

iWl-Arn

49 the terms involving natural logarithms having the value zero when m =0 , In the special case p = 3(c =0), the summation over q reduces to the single term corresponding to q * 0 , and (3 .4 6 ) reduces to ,-w

where

0. » , ' V lX 0 3

r= 0

-j

2 (k+r^

(3.53) r=

-k

r * r’ t_0 r 1 /- j

\

2 s=0 s^j-k

X j-k-s (- ^ s n v k)

m"-2+2k-23[ | ^ f S )

4

r

^Vi-k-sl

l-Yl-4m; o /

(3.55) Do(m3) =

/

In ■r l+yl-Am^

a i- -k

(3.54) C (m ) = 0 3

t

j-k+1 /"Sli -k) n-2+2k-2j l-Vl-W^ m3' ' In l+/i%4m.

J

.

This result can be verified by substituting c = 0 in (3 .18) and integrating.

Some numerical examples for specific values of n will be given later. Case lb. n even, £ even. Let p = 2 + 2c, where c = an integer ^ 0, Substituting this value in (3»7), the distribution of Hy can then be expressed as a double integral, as follows:

// J L i XaE K - A I (3-56) Gn,2.2=(m3}

i-

zî:2

=c

_c

c- 1/2 dmi dm2

Rewriting the expressions in brackets, this becomes

Z

I4-VM/ÏmÎ

Z

,

[(m2-l )m1+(l-m2-m^ )j

(3'57) Gn,2.2c(m3 ) = C

-

2 I c-1/2

Lm2 mi -m3 J

This can also be written in the form

(3-58) Gn,2.2=('ll3)

where

i--»* 1

= C l-Vï-i/ieî / W,

m e-1/2 ^ , v u dx dm.

dm^ dm2

51

(3.60) v = a /+ bx = (m2 - 1)

+ (l-m^-ny ); a - 1 -

b -

- 1,

We shall also need to use the expression

(3.61)

In order to

/ / 2 k = ab - ab =

2

.

perform the integration with respect to x indicated

in (3.58)

we make use of two reduction formulas given by Peirce^:

,3.6=,

-, -

[a.'*1

(=«-1)^.- . - 5/2 « ]

and

« • « >

Replacing n by c, and applying (3 .6 2 ) repeatedly, we find

(3 .6 4 ) ( / u°-l/2 dx - 2vm+1 Ci1 Igcrl),(2c-3),:V.( ^ l - 2 j ) ^ u c- 5 ^ J j=0 (2m+2c+l)(2m+2c-l)**’(2m+2c+l-2j)b ^ !

+ (2c-l)(2c-3)'"l kC b7 ° fvm dx . (2m+2c+l)(2m+2c-l )**’(2m+3 ) J

Repeated application of (3.6 3 ) gives

15 Peirce, B. 0., A Short Table of Integrals, Ginn and Co., Boston, 1929, pp. 18, 19, Formulas 120 and 118.

52

(3-65) S w 2 ■ a

f

T

^

q

xm+l/2 ^ ;

2m(2m-2)""*l (2in+l) (2m-l) ••*2

k111^1^2 ;fdx

hm+l/2 ): fxv

Another formula given by Peirce^ is

(3.66)

= V uv

2 V-bb

V

bv

Substitution of this result in (3 *6 5 ), that result in turn in (3 .6 4 ), and finally that result in (3»58) gives yfi-j-vyi9- z

j 2vm+1 C"s (2c-l) (2c-3 )-- (2C+1-2.1 )kjuc-1/2~j I j=0 (2m+2c+l) (2m+2c-l) *•*(2m+2c+l~2 j)b^+1

0- 67> Gn,2.2C(m3} ■ c

m-l/2 2/û Z (-l)q (2m+2c+l)(2m+2c~l)•••(2m+3) L q=0

+ (2c-1)(2c-3)->1 k cb' -q-r-s-t (1_cos2e) ----------------------- W

-

-

j

-----------------------------:-----------------------------

dO

(1+V l-4ny cos ©)u+ i

0

(

+\

where © = n - ©.

§ (l+C i Z cos e)2’1 -q-r-s-t ^2^ ^--- - a-------— -------- d© 0 cos 6^1 *■

Thus Gn 2+2c^m3^

Siven ^ a constant times a function

of ny times an expression of the form

(3.110)

zn ( 2 P(cos ©) d© ) o \/q+p cos ©

zn + J 2 P(cos Q)d© J o V q-p cos ©/

where P(cos ©) and P(cos ©^ are infinite series of terms in powers of cos © and cos ©^ including a term of zero power (constant); q = 1, p = vi-lm^ . But even when the numerators of the integrands of (3.110) are constant or of the first degree in cos ©, the integrals are elliptic, 20 as one may observe from tables given by Bierens de Haan . Hence it ap­ pears that Gn 2+2c(%y) cannot be found in closed form. For specified numerical values of n (odd) and p (even), approxima­ tions to Gn 2+2c^m3 ^ can be found, using Simpson5s Rule or some other

20 Bierens de Haan, D., Nouvelles Tables d*Integrales Définies, Leyden, 186?, Table $6, No. 5-8.

67

approximation method. One example will be given later. Case 2b: n odd, £ odd. Since equation (3.16) establishes equality between each Case 2b value of G-(niq) and some Case 2a value, as we have already observed, we have in Case 2b, as in Case 2a, an infinite series of elliptic integrals, and hence it appears that integration in closed form is impossible.

Just as in Case 2a, we can integrate approximately

for specified numerical values of n (odd) and p (odd), and one example will be given later. Numerical Illustrations of the Distribution of

. By substituting

values of n (even integers) in (3.46) - (3 .50 ), or by substituting such values of n and c = 0 in (3.18) and integrating, we obtain the following illustrations of the distribution function Q(m^) in Case la:

(3.1H)

(3.112)

G6^3(m5) = C

(3.113) G8j3(m3

& In

In normalizing, that is, finding C so that

68

determining the moments, the integration can be simplified by using the relation

l—\/l--/+mQ -1 In ■— ■. = -2 sech 2nn l+Vl-Am^ J

(3 . H 5 )

Upon normalizing, we find that C takes the values 6 /lï, 30 /n, 84/n and

180/n in equations (3 .111) -

( 3 .114), respectively.

Since only even

powers of ny occur in the right members of (3 .111 ) - (3 .114 ), the dis­ tributions are symmetric about nya0 , and all moments of odd order are zero.

The moments of even order, together with the parameters o_ and

a4 :m^ > can easily be computed, using the relations 1

(3-ll6) ir:m, J

')

, " f G(m3) ^ 3 ' %

-%

’ a4:nu =

= J

3

J

' 2:mj

Values of the function G (m_) for p = 3 and n = 4 , 6 , 8 , 10 are given n»P J in Table la for values of ny at intervals of .05 throughout the range - ^ < ny < L . The parameters Table lb.

and

for Gn^^(ny) are given in

The data of Table la are shown graphically in Figure 2.

By substituting values of n (even integers) in (3-93) ~ (3*95), or by substituting such values of n and c = 0 in (3 *57 ) and integrating, we obtain the following illustrations of the distribution function G(rty) in Case lb:

(3.117)

2(a3) - G-anfi sin-1 V ü y

- L Inyl

69 Table 1. Values and Parameters of Gn

a. Values of Gn

ny 0.00 db0.05 10.10 A 0.15 A 0.20 * 0.25 ± 0.30 ±0.35 a 0.40 ± 0.45 a 0.50

) at Intervals of .05

^4,3

°6,3(V

1.9098593 1.8431196 1.6961434 1.4998037

2.3873242 2.2446758 1.9524282 1.5927453 1.2139916 0.8521659 0.5352133 0.2839646 0.1115549 0.0212405 0.0000000

1.2716365

1.0251855 0.7725373 0.5257932 0.2986749 0.1098211 0.0000000

) for n - 4, 6} Q} 10

G10,3(V 2.9708919 2.6703244 2.1351948 1.5627603 1.0429516 0.6221137 0.3184713 C.129067 I 0.0344225 0.0033369 0.0000000

3.5809862 3.0526962 2.2226064

1.4462200 0.8377445 0.4210495 0.1742594 0.0535200 0.0096200 0.0004756 0.0000000

b. Parameters 0m3 and a4:m3 f°r CnJ m 3 )

G4,3(m3)

G6 ,3 S )

G8,3(m3)

^10,3^3^

am.3

0.19365

0.16366

0.12087

a4:m3

2.3810

2.5926

0.13944 2.8177

3.0164

I

s

m

ii i

i l



71

(3.118)

^(m^) = 0 2 n | ^ + | m^) sin 1 \!1-1#^ - |m^|

^ ny)it-Aay

(3.119) Gg^(ny) = C-anJj^ + || try) sin 1\/ 1-W^ - Im^

^ m| + I m^)

Vl-4ay]

( 3 . 120)

(try ) = C - 2n

f

VTtTsT

, 525" _ 2

IDS' _ 4 ’

%?: 3

(4W +

IT

3

) sin'Vl-!#^ -lm I

m2 _ .IL >

3072. 3

192. 3

_ i m4VlZ

9 3/

In normalizing and in determining the moments, the integration can be simplified by using the relation

(3.121)

sin 1 l-lm^ = cos”^ I2m^l .

Upon normalizing, we find that C takes the values 3/n, 15/211, 14/n and 45/2II in equations (3.111) - (3.114), respectively. the distributions are symmetric about

As for Gn ^(ny),

= 0, so that the moments of

odd order are zero, and the moments of even order and the parameters and

are easily found by using ( 3 .116).

Values of the function

Gn,p(ny) for p = 2

and n = 4, 6, 8, 10 are givenin Table2a for values

of ny at intervals

of .05 throughout the range -l ^ rry ^ ~ . The para­

meters a

and a..for Q

0 (m_) are given in Table 2b.

Table 2a are shown graphically in Figure 3 •

The data of

72

Table 2. Values and Parameters of

a. Values of

ny

0.00 ±0.05

±0.10

G4,2^y) 2.3561944 2.0566946

±0.15

1.7602188 1.4698829

±0.20

1.1890100

±0.25 ± 0.30 ±0.35 ±0.40 ±0.45

0.9212772 0.6709428 0.4432493 0.2452516 0.0880888 0.0000000

±0.50

b.

a4:m^

2^m3^

3.3133984 2.6075744 1 .9900527 1.4605447 1.0182905 0.6617776 0.3883997 0.1940009 0.0721871 0.0130847 0.0000000

^or n =

8, 10

Intervals of .05

G8, 2(m3 )

G10,2(V

4.2951461

5.2850431 3.3719196 2.0362519 1.1466289 0.5890354 0.2668309 0.1006905 0.0284483 0.0047353

3 .O43937O

2.0669814 I .3293352 0.7959217 0.4321175 0.2035032 0.0765096 0.0190583 0.0017353 0.0000000

0.0002158 0.0000000

Parameters am^ and a^;m^ for

G4,2^m3^

G6,2^m3^

G8,2^m3^

G10,2^^

0.18257

0.14638

0.12172

0.10403

2.5714

2.9630

3*2873

3-5538

74 The probability that ny lies in the interval (a,b) for given values of n and p is given by

(3.122)

Prob(a < m3 < b j n = nQ, p = p0) = |

Gn

(ny)

day

.

These probabilities for n = 10, p = 3 and 2, and b - a = .05 are given in Table 3 . By reference to Table 3 , or by comparison of Tables lb and 2b, we notice that the values of ny cluster more closely about the mean (zero) for the smaller value of p(p = 2) than for the larger (p = 3)«

For p

constant and for the range of values of n considered, om^ decreases as n increases, but

is larger for p = 2 than for p = 3 • The functions

G ^ 3 (ny) have derivatives at every point, but the derivative of

^(ny )

at ny ~ O does not exist. As an example of the use of approximate integration in Case 2a, we shall find G^^>(ny ) by SimpsonJs Rule.

Substituting n = 5 and p = 2 in

(3 .7 ), the function G^ ^(ny) can be expressed as a double integral, as follows :

(3.123) G5)2(ni3 ) = C

I , ___ \

1

,

n [(l-m1)(l-‘m2)-m3]

1-wix

Rewriting the expression in brackets, we have

dm-,

■^

dm2 .

75 Table 3 . Probabilities that ny lies in Certain Intervals for n = 10, P = 3 and 2

ny Less than - 0.50 (-0.50) - (-0.45 ) (-O.4 5 ) - (-O.4 0 ) (-0.40) - (-0.35)

(-0.35) - (- 0 .3 0 ) (-0 .3 0 ) - (-0.25) (-0.25) - (-0 .2 0 ) (-0 .2 0 ) - (-0.15) (-0.15) - (-0.10) (-0 .1 0 ) - (- 0 . 05 ) (-0 .0 5 ) - ( 0 .0 0 ) 0 .0 0 0.05 0.10 0 .05 0 .1 0 0 .1 5 0.20 0 .1 5 0 .2 0 0.25 0.25 - 0 .3 0 0 .3 0 0.35 0.40 0.35 0.40 - 0.45 0 .5 0 0.45 More than 0.50

nny

Probability (P = 3)

Probability (p = 2 )

Less than - 5.0 (-5.0 - -4.5) (-4.5 - -4.0) (-4.0 - -3.5) (-3.5 - -3 .0 ) (-3.0 - -2.5) (-2.5 - -2.0) (-2 .0 - -1.5) (-1.5 — -1 .0 ) (-1 .0 - -0.5) (-0.5 - 0.0) 0.0 - 0 .5 0.5 - 1 .0 - 1 .5 1 .0 - 2.0 1 .5 2.0 - 2.5 2.5 — 3.0 3.0 — 3-5 4.0 3-5 4.0 4.5 5.0 4.5 More than5 .0

0.0000000

0.0000000

0.0000028 0.0001833 0.0013493

0.0052741 0.0142609 0.0306819

0.0563102 0.0911662 0.1320897 0 .l6 s68 l6 0.1686816

0.1320897 0.0911662 O.O5631O2 0.0306819 0.0142609

0.0052741 0.0013493 0.0001833 0.0000028 0.0000000

0.0000021 0.0000849 0.0007002 0.0029445

0.0086803 0.0205943 0.0422227 0.0779644 0.1330853 0.2137213 0.2137213 0.1330853

0.0779644 0.0422227 0.0205943 0.0086803 0.0029445 0.0007002 0.0000849 0.0000021 0.0000000

76

/i+jESzi-2 * % I I-ll "”12 (3.124)

= C 2.

/ jnL

Splitting the integrand into two parts, we obtain

/l- J21.

rl-

(3.125) G5)2(m3) = C

cta2.

^

^

VW

Î

*

Performing the indicated integration with respect to m^, we find

0.126) G

(ny) = G

2 (-;2m3~m2 i )(m2~l) ^ 3m;

m 4 t 2(1^1) 3

™2

11_ JBbl-'Ma

/m*. Evaluating and simplifying, we have

(3.127)

G5,2(m3) = C

^5 ,2 ^ V C = 16/9 and

4_

i-Vi-4^,' 3m^ V

2

2\S

l-m0

^(O.3) = 0 can be found exactly from (3.127).

By

assigning values to ny at intervals of 0 .0 5 from 0 .0 5 to 0 .4 5 and then integrating each result approximately by Simpson*s Rule, using a suitable number of intervals, say ten, we can find G_

(m )/C. 3

For nu = .05, .10, 2

77 .15, .20, .25 division into ten intervals does not give a sufficiently accurate approximation, so the first two of the ten intervals are replaced by a number of smaller intervals for greater accuracy.

After G

5 ,2

(m )/C 3

has been determined for values of ny at intervals of .05 from 0 to 0 .5 , /i we can then normalize, that is, find G so that \ G (m ) dm = 1 ( or ).i 5,2 3 3 /1 ^3 f 1 j 1 G5,2^m3^A:= integrating again by Simpson's Rule, and thus we can find G^ ^ny^ at intervals of 0.05 from 0 to 0.5.

The results are

recorded in Table 4 and are shown graphically in Figure 4.

The results

for n = 4 , p = 2 and n = 6 , p = 2 , which have already been given in Table 2a and Figure 3, are repeated for the sake of comparison. As an example of approximate integration in Case 2b, we shall find Gtj^Cny) by Simpson's Rule.

Substituting n - 5 and p = 3 in (3.7), the

function G^ ^(ny) can be expressed as a double integral, as follows: £/l(3.128) G

1

\ ;

(m ) = C 5,3 3

r

on 1

(1 -nu )(l-m_) - m~ I a dnv dm, 3J i «

jA

Rewriting the integrand, we have

1 t~\}i

| \ J

I—

2.

|(m2-l )m1+ (l~m2-ny

)J ^

dm^ dm,

'«iau Performing the indicated integration with respect to ny, we obtain 21 j(m2 -l )m-|_+(l-nywny )j

(3.130) G ^ ( m J = G 3(^"l)

i-»fa ^2

78

Evaluating, we find

0 .«o

w

. s

G ^ 3 (0 )/C = 4/9 and G^ ^(0.5) = 0 can be found exactly from (3 .I3 I).

By

assigning values of ny at intervals of 0 .0 5 from 0 .0 5 to 0 .4 5 and then integrating each result approximately by Simpson's Rule, using a suitable number of intervals, say ten, we can find G_ _(m )/C.

The accuracy of

approximation for ny = .0 5 , .10 and .15 is increased in the same way as in finding G^ ^(ny). We can then normalize, that is, find C so that ^ * G 5 ,3 ^m3 ^ d™3 = 1 (°r ^

^(iy) dm^ = j), by integrating again by

Simpson's Rule, and thus we can find G 0 to 0.5.

_(m ) at intervals of 0.05 from 5,3 3

The results are recorded in Table 4 and are shown graphically

in Figure 5*

The results for n = 4, p - 3 and n = 6 , p = 3, which have

already been given in Table lb and Figure 2, are repeated for the sake of comparison.

A glance at Table 4 and Figures 4 and 5 is sufficient to

convince one that a fair approximation to G2 k+l,p (^3 ); k an integer, can be obtained by linear interpolation between G , (m ) and G_, (m ). 2 k,P0 3 ^ ^ 2 ,p^ 3 It should be noted that the values of n involved in these numerical illustrations of the distribution function G (m ) are much too small n,P 3 for ray to be a good approximation to V.

They may, however, prove to be

useful in determining the exact distribution of V for p > 1, if the dis­ tribution of m^-(l-ny)(l-uy), the denominator of (3 *1 ), can be found.

79

Table 4. Values of G (m.) for p = 2 and 3; n = 4, 5 and 6 n j? 3

*3

G4>2^m3^

^ 5 ,2 ^ 3)

^6 ,2 ^ 3 )

G4,3(m3 )

^ 5,3 (5 )

G6 , 3 ^

0 .0 0

2.3562 2.0567 1.7602 1.4699 1.1890 0.9213 0.6709 0.4432 0.2453 0.0881 0.0000

2.8259 2.3449 1.8953 1.4856 1.1165 0.7941

3 .3 1 3 4 2.6076 1 .9 9 0 1 1 .4 6 0 5

1.9099 1.8431 1.6961 1.4998 1.2716 1.0252 0.7725 0.5258 0.2987 0.1098 0.0000

2.1313 2 .0 0 5 2

2.3873 2.2477 1.9524 1.5927 1.2140 0.8522 0.5352 0.2840 0.1116 0.0212 0.0000

±0.05 ±0.10 ±0.15 ±0.20 ±0.25 ± 0 .3 0 ±0.35 ±0.40 ±0.45 ± 0 .5 0

0 .5 2 0 6

0.2995 0.1373 0.0347 0.0000

1.0183 0.6618 0.3884 0.1940 0 .0722 0 .0 1 3 1

0.0000

1.8268 1.5692 1.2690 0.9626 0.6657 0.4014 0.1899 0.0518 0.0000

ffilla^

11Ü1B11

li::::::: I

BEI s

m Èf :

i l l i i

■101 SSBlSSSSBSliiiil lilllllil

i

82 C. The Non-degenerate Case : ^

If we remove the restriction

= 0,

/ 0

= 0, while keeping

= 0, the joint

distribution of m^,

and ny contains two factors which did not appear « v y2in the degenerate case. One is the exponential e3-m2 (the other terms of the exponent drop out when

= 0).

The other is the expected value

of (3.3), calculated under the assumption that the variâtes t^

are nor­

mally and independently distributed with unit variances and E(t^) = m

fi ^

and

P

+ \ ra.(±=1>'"> P» %=!, '", P), ^here

2 a=l

va = m . #ien

2

2

2 ua = m1, 2 va = m a=l a=l

= 0, we have, of course, E(t^) =

va •

Since r ^ = 2 t . t . the expected value of (3 .3 ) is 1J a-1 ^ Ja

/ (3.132)

(n+2-p)/2' i'll ;

rlp \ ;

\ rPl

rpp 1

E

j

2 t a=l la

P 2 t, t a=l la pa

= E P 2 t t, \ a-l

ï” la

ï. t pa a=l

By the usual rules of matrix multiplication we can replace the determinant on the right by the product of two determinants, giving

V(n+2-p)/2'

/

/ ru (3.133) E

t

1 rpl

V (n+2-p)/2

**• rlp

) •••

tll ,,e tpl * Î ! tlp '** tpp "* h p

tll "" tlp =E

3 p1

)

But transposition of a determinant does not affect its value, so we have

83

\(n+2-p)/2 ril

rlp

'lp|

pl

PP

2\(n+2~p)/2

=E

:

(3.134) E

11

rpl *" rPP Simplifying, we obtain

v(n+2-p)/2~

-/

(3-135) E

rll “ ■ rlp : :

\rpl

' rPP

\ n+2-p“ / tll *** tlp » : = E

1

.1 tpi

tpp

I

/

.

The assumptions under which the expected value is to be calculated give va as the mean or expected value of the distribution of the 'in, tjjQ, , and the following central moments characteristic of the normal distribution:

= 1,

- 0,

(2k-l), M(2k+l)*t. = 0,

'20

= 3, •••, ^ k : t ^ = 1 '3 20

In calculating the expected value of (3.3);

using (3 .1 3 5 ), we shall need to know the moments about the origin (v’s) in terms of the central moments (/Vs) and follows:

By definition we have

(3.136)

V n;x =

. This can be found as

/CO

xn f(x) dx. /-CO

Addition and subtraction of

(3.137)

Vn:x

gives

[U-V„x) * V,:XJ n f(x) dx_

84

Expanding by the binomial theorem, we obtain


l), we have

(3.153) E [^tllt22*t,12t21^3] = ^3

^^2

^11^^11^ ^

^ 2 2 ^ :^22+'I>E:^22^""^ ^ 2 :t1^+'îi :t^i^^2 ^ 22^ ^

:t22

^22^- ^l2^^^l

"M:t^l:t22^2:t^t^)^2:t^:t^)

"(M3 :t12+3Al2:t12Vl:t12+^ :t12)U3 :t21+3M2 :t2 ^ 1:t21+V2 ^2l ^

87

Substituting 0 for ^ , , 1 for ^ ia ^

and ia

for a

.+ , we find ia

(3.154) ^ [ ( t n V V a d 3] '

(yiv2)(ï2vi)

+3 ( \ \ ) (^2v2)(E+1i^)(l+Y^)-(3 T ^ + ^ v ^ ) (3 ?2Vi+^2V1')=0

For n = 4, (3.141) becomes

(3 .1 5 5 ) E [ ( t n t 22- t 12t 21) 4] = E (t^ t4 2-4t311t322t 12t 21+6 t21t 2/ 12t2 ]L- 4 t 11t 22t 3 12t 3

21

Since the t^'s are independent of each other, this becomes

(3 .156) E[(t11t22-t12t21)4]=E(^1)E(4

)-^(t3 11)E(t3 22)E(t12)E(t21)

+6E(t2i)E(t3 22)E(t22)E(t2 2l)-^(tu )E(t22)E(t32)E(t3i)

+E(t42)E(t^).

But E(t^ ) = V

, by definition, so (3.156) becomes * ia

(3.157) E [(t11t22-t12t21)4j =-=(;t11V 4;t22-4T^ !tiV 3 . 1^ ;ti V-l :t21+6T/ . tll V 2:t22V 2:t12V 2:t21”4^ : t i;^l:t22V 3:t1^/3:t21+"Ii:t1^y4:t9l

Substituting 3 for

. , 0 for ^0 .+ , 1 for /z9.+ and ?.v for 7/ , ^ ia ia ^ia 1a ■L'Lia

we find

(3.158) e [(tU t22-t12t21)4]=(3*6^v2tï4v4)(3+6 ^ +^^)-4(3t1v1+ Jv3)

(3V 2 +^ v2)(V 2 )(V l )+6(1+^

)(1* ^ T2)

( 1 + ^ ) (1^2 t2)_4^ Vi )(^ v 2) ( 3 ^ 2 ^ ) (3 V x h 3’ 3)

X3+6^v24 4v4)(3+612^ Y 4v4)

.

Simplifying, we obtain

(3.159) E [ (t i1t 22- t 12t 21)ij =24+24(^2^2) ( v2+v^)+3T4(v2+v2)2+6^X2( t2+v2)2

2 2^V >3(v 1+v 2)Ï>.

P

2

Simplifying further, using m = I v , we find ^ a=l a

(3 .160)

In general, E ^11^22""“12^21^

.

1=

and, though it has not been proven

rigorously, it appears that the following result holds in general

(3 .1 6 1 ) e [(t11t22-t12t21)2kl=(2k)! s L J 0=0

2J(j!)

mJ ( I f y 1=1

.

89

At any rate, (3.161) gives the correct results for k = 1 and 2(n = 2 and U) for which the values have already been given in equations (3 .1 4 9 ) and

(3 .I6 O), and also for k = 3 (n = 6 ), for which the result can be shown to be

(3.162) E ^ u h a - h a h i ) 6] ■ 720+1080*2

For 0 = 3 , the expected value of the expression (3.3) reduces to

n+2-3 tll t 12 t 13 (3.163) E

t 21 t 22 t 23

=E[(t llt22t33+t12t23t31+t13t21t,32*'tllt23t32

b l t32 t33 -\2'2l'33"\3W31

r1] .

By the same procedure at that used for p = 2, we can show that

(3 .164) E {\t11t22t33+t12t23t31+t13t21t32“tllt23t32’t12t2lt33"t13t22t'3l)] =0

(3.165) E [ ( t u t22t33H 12t23t3l.t13t21t32-tu t23t32-t]_2t21t33-t13t22t31)2]

=6+2m0

2 Y?

2 i=l

(3.166) E [(t11 ' t22t33+t'12t23t31+t13t2lt32"tllt23t'32"t12t21t33”t13t22t31^ ] =°

(3.167) E [(tiit22t33+tl2t'23t31+t13t'21t32"tllt23t32“t12t21t33”t13t22t31^ ]

= 360+24Qmo ^ i=l

1

+ 24m2 ( 2 ^ ) 2. ^ xi=l y

90

expected value of (3 .3 ) is zero,

In general, for p = 3and n even, the while for p - 3 and n

odd, it appears that the expected value of (3.3 )is

a polynomial of degree

in m

2 *?? ) .

( ^

i»l

For general values of p, we see from (3*135) and the preceding in­ vestigation that the expected value of (3*3) is zero whenever (n - p) is odd (Cases la and 2a), but it is a function of

apparently a polyno-

p mial of degree s&^JL^in m

Z'i. i=l

2b).

f whenever (n-p) is even (Cases lb and

1

At first thought, it may seem quite disturbing that the expected

value of (3 *3 ) should

ever bezero, for then there is a zero factor in

the joint distribution of m^, m^, andm .

But m^, m^ and m^ must have a

joint distribution, and hence when the expected value of (3 *3 ) is zero, the normalizing constant C must tend to infinity in such a manner that the indeterminate form 0*ca takes a finite, non-zero value.

That being

true, the value zero for the expected value of (3 *3 ) plays exactly the same role as any other constant value, that of showing that the expected value does not depend on the variable n n , and that therefore no additional factor, other than e

*«1 r* , need be introduced into the joint

distribution of m-^, m^ and ny when we remove the restriction 7^=0.

Hence

removal of the restriction necessitates only insertion of the factor

'I e

*

in the joint distribution in Cases la and 2a, but the expected

2 Z % ., must also be ^ i=l 1 p

value of (3 .3 ), a polynomial of degree inserted as a factor in Cases lb and

2b.

in nu

91

As an example of the procedure to be used when ^ 4 0, but

= 0,

we shall perform the integration in Case la and give numerical, results for n = 10, p = 3•

The only change from the degenerate case that is nec-

essary is the insertion of the factor e in the joint distribution.

A Xv *l'*nx I ^ %*2 ^ = e , where a= 2%T, a i=l 1

Making this change in the joint distribution

(3.7), letting p = 3+2c, where c = an integer Z> 0, and indicating inte­ gration with respect to m^ and m^, we find

(3.168) Gn)3+2c(m3) - G|

e

*)

K v 1^ ] d”i d! ™2'

Since the factor inserted is not dependent on

the integration with

respect to m^ and the subsequent evaluation proceed exactly as before, yielding a result like (3 .2 6 ) except for the presence of the factor e*^

‘M6M W

V

- 0 ; | h ‘) J

U )J

^ (ry )+B (ny )-C(m^) -D(m^ )j

where

(3.170) A(m3) =

L~x . \ 2(k+q+r)[ a I'J 7 P L g g

a-w, c-q ' m2 (m2_1)

Ct-vn^ C— q

^

92

-C-A n-2+2k-2j/' j±jEH5i

(3.172) C(m^) - V ( - l ) sp

g-°-k)^'

j-k-s-1, dm2

e

8=0

s^j-k

(3.173) D(m3) = (-1)

j-k /-■g” -c-k) -c-k\

( j- k

n-2+2k-2 j f'±EI±£ n J m3

ea^,dmm2

Applying the binomial theorem to the integrand of (3.170), we find

a (3.174) I-\/

(m2- l ) "

-c-k-r i % >1 *

$-5

\ -y V -1 1-v

5 .5 *

5jx_ a~a f a ~2 •

J~3 ’

"

6-6 !

/

The infinite series contained in (3 *19 5 ) and (3 .196 ) converge for all values of a, quite rapidly for small values of a. example, a =

(3.197)

G10>

Let us consider, for

. Substituting this value in (3 .192) - (3.196), we obtain

(iHj) = Cf e 3/8|QZ (m3) -R(m3)j -5 ^ ) +T(m3)J

where (3.198) Q L 3) = e‘11

t

Whenever the results at any stage had more than seven places to the right of the decimal point, they were rounded to seven decimal places. The operator was given the following complete outline of the steps required in the various calculational stages, the numbers in parentheses representing the results of previous steps.

10 Cl)

^

10 P

n ny

G"n-2,p (m-g)

Our revised approximation should prove satisfactory for some-

(ny)

what smaller values of n than Wald's n ny, but we have seen that it also badly underestimates the frequency of occurrence of extremely large values of |vl for n as small as 10.

Hence there is need for

125 knowing the exact distribution of V in the multivariate case, or if it cannot be found, a better approximation is needed. mation will probably involve

Any better approxi­

and m^ as well as ny.

The distribution

of ny and of ny can be found, in some cases at least, by the same method used in Chapter III for determining the distribution of ny, that is by integrating the other two variables out of the joint distribution.

There

is still the question of how the separate distributions of ny, ny and ny can be put together to determine the distribution of V, in view of the fact that ny is not independent of ny and ny. Another possible method of finding the exact distribution is to solve the equation V = - n ny / [m^ - (lHn^)(l-ny)] for one of the variables ny, ny and ny in terms of the other two and V, determining the Jacobian of the transformation, and then transforming the joint distribution of ny, ny and ny to find the joint distribution of V and the other two remaining variables.

This has been done, but so far all.

attempts to integrate out the other two variables to find the exact distribution of

V have

resulted in failure, due to the fact that 7 is 2 infinite at one bounding surface (l-ny) (l-ny) - ny = 0 of the domain of integration. The normal population of 10,000 numbers stamped on beads for the sampling experiment described in Chapter IV should prove useful, not only for future investigations of the empirical distribution of V, but also for a wide variety of other experiments with normal populations.

126

VI. BIBLIOGRAPHY

1.

Bierens de Haan, D., Nouvelles Tables d?Integrales Définies, P. Engels, Leyden, 186?.

2.

Dwight, H. B*, Tables of Integrals and Other Mathematical Data, Macmillan Co., New York, 1%7.

3•

Fisher, R. A., "The Use of Multiple Measurements in Taxonomic Prob­ lems," Annals of Eugenics, Vol. 7 (1936), pp. 179-188.

4.

Fisher, R. A., "The Statistical Utilization of Multiple Measurements," Annals of Eugenics, Vol. 8 (1938), pp. 376-386.

5. Glover, J. W., Tables of Applied Mathematics in Finance, Insurance, Statistics, George Wahr, Ann Arbor, 1930. 6.

Kenney, J. F., Mathematics of Statistics, Parts I and II, D. Van Nostrand Co., New York, 1939»

7.

Kossack, C. F., "Some Techniques for Simple Classification," Pro­ ceedings of the Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley and Los Angeles, 1949, pp. 345-352.

8.

Mahalanobis, P. C., "On Tests and Measures of Group Divergence, Part I: Theoretical Formulae," Journal and Proceedings, Royal Asiatic Society of Bengal, New Series, Vol. 26 (1930), pp. 541588

.

9.

Mathematical Tables Project, Table of Arc sin x, Columbia University Press, New York, 1945•

10. Neyman, J. and Pearson, E. S., "Contributions to the Theory of Test­ ing Statistical Hypotheses," Statistical Research Memoirs, Vol. 1 (1936), pp. 1-37. 11.

Pearson, Karl, "On the Coefficient of Racial Likeness," Biometrika, Vol. 18 (1926), pp. 105-117.

12. Peirce, B. 0., A Short Table of Integrals, Third Revised Edition, Ginn and Co., Boston, 1929. 13.

Shewhart, W. A., Economic Control of Quality of Manufactured Product, D. Van Nostrand Co., New York, 1931.

127

14. Shrivastava, M. P., "On the D^-Statistic," Bulletin of the Calcutta I4atheniatical Society, Vol. 33 (1941)* PP* 71-86. 15- Wald, Abraham, "On a Statistical Problem Arising inthe Classification of an Individual into One of Two Groups," Annals of Mathematical Statistics, Vol. XV (1944), pp. 145-162. 16.

Whittaker, E. T. and Watson, G. N., Modern Analysis, Fourth Edition, Cambridge University Press, London, 1940.

128 VITA

Hannan Leon Harter was born in Keokuk, Iowa, August 15, 1919. spent his boyhood on a farm in Hancock County, Illinois.

He

He was grad­

uated from Bowen Community High School in 1936, and received the degree of Bachelor of Arts from Carthage College in 1940.

He won the Carthage

College Scholarship in the University of Illinois, and received the degree of Master of Arts from the latter institution in 1941.

He served as a

graduate assistant in mathematics at the University of Illinois from 1941 to 1943? during which time he continued graduate study.

In 1943 he was

appointed professor of physics at Missouri Valley College, Marshall, Missouri, and served in that capacity for one year.

From July, 1944, to

March, 1946, he served as a radio technician in the United States Navy. From March, 1946, to August 1948, he served as an instructor in mathe­ matics at Purdue University, also doing part-time graduate study.

His

time has been spent entirely on graduate study and research since Sep­ tember 1, 1948, during which time he has held an appointment as a Purdue research fellow.

He has accepted an appointment as assistant professor

at Michigan State College, effective September 1, 1949.