Northwestern Tri-Dimensional Pursuit Test: An instrument for pilot selection

520 99 4MB

English Pages 107

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Northwestern Tri-Dimensional Pursuit Test: An instrument for pilot selection

Citation preview

NORTHWESTERN UNIVERSITY LIBRARY Manuscript Theses Unpublished theses submitted for the Master*s and Doctor*s degrees and deposited in the Northwestern University Library are open for inspection, but are to be used only with due regard to the rights of the authors. Bibliographical ref­ erences may be noted, but passages may be copied only with the permission of the authors, and proper credit must be given in subsequent written or published work. Extensive copying or publication of the thesis in whole or in part requires also the consent of the Dean of the Graduate School of Northwestern University. This thesis by has been used by the following persons, whose signatures attest their acceptance of the above restrictions. A Library which borrows this thesis for use by its patrons is expected to secure the signature of each user.

NAME AND ADDRESS

DATE

NORTHWESTERN UNIVERSITY

NORTHWESTERN TRI-DIMENSIONAL PURSUIT TEST AN INSTRUMENT FOR PILOT SELECTION

A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for the degree DOCTOR OF PHILOSOPHY

DEPARTMENT OF PSYCHOLOGY

BY ALBERT CLARENCE VAN DUSEN

EVANSTON, ILLINOIS August, 1942

ProQuest Number: 10102076

All rights reserved INFORMATION TO ALL USERS The q u a lity o f this re p ro d u c tio n is d e p e n d e n t u p o n th e q u a lity o f th e c o p y s u b m itte d . In th e unlikely e v e n t th a t th e a u th o r d id n o t send a c o m p le te m a n u scrip t a n d th e re are missing p a g e s, th e s e will b e n o te d . Also, if m a te ria l h a d to b e re m o v e d , a n o te will in d ic a te th e d e le tio n .

uest ProQ uest 10102076 Published by ProQ uest LLC (2016). C o p y rig h t o f th e Dissertation is h eld by th e A uth o r. All rights reserved. This w ork is p ro te c te d a g a in s t u na u th o rize d c o p y in g u n d e r Title 17, U nited States C o d e M icrofo rm Edition © ProQ uest LLC. P roQ uest LLC. 789 East Eisenhower Parkway P.O. Box 1346 A nn A rbor, Ml 48106 - 1346

ACKNOWLEDGMENTS

The writer wishes to acknowledge with sincere grat­ itude the valuable assistance of Northwestern University staff members: especially that of Dr. R. H* Seashore, who directed the study; Dr. C. E. Buxton for statistical and editing counsel; Dr. E. L. Edmondson for assisting in ob­ taining one of the groups studied; and Mr. H- A. Coopmans for technical assistance. Sincere appreciation is also acknowledged to Lieut. Col. John C. Flanagan, Major Arthur W. Melton, and Major Robert T. Rock, whose helpful offices made possible the preliminary evaluation of the instrument in terms of the needs of the Army Air Forces.

TABIE OF CONTENTS

CHAPTER

PACE

I. Introduction........................ II.

III. IV.

1

Apparatus and Procedure........................... Description of Apparatus................ Instructions to Subject................. Preliminary Tryout and Final Adjustment.......

17 17 18 19

Setting of the Experiment........................

21

Critical Aspects of Aptitude Test Evaluation 24 Qualitative Analysis of Individual Differences 24 Quantitative Analysis of Individual Differences27 Securing a Reliable Test.................. 27 Securing a Valid Test.................... 28

V. Results and Discussion............................ Reliability........... Validity........................................

41 46 49

VI. Suggestions for Further Research.................. 62 VII. Summary..........................

65

VIII. Appendixes......................................... A. Details of Apparatus Design................ Housing Unit................. Control Panels............. Independent Controls. ............... Airplane Controls..................... Problems in Developing Apparatus.........

70 70 70 75 76 78 81

B. Raw Score Summary........................... 1. New Aviation Cadets (NACs)............ 2. Washed Out Pilots (WOPs).............. 2. Civilian Pilot Training Students(CPTs) 4. Anti-aircraft Soldiers (AASs).........

84 84 86 88 89

Bibliography

90

LIST OF TABLES

TABLE I I. II.

PAGE

Average Errors Per Trial of the Criterion Groups.... 4£ Reliability Coefficients.............................

III. Significance of Differences between Group Means Based on Final Status.......

47 50

IV. Percentage of Each Group in Selected Ranges of Combined Distribution of Total Error Soores.... 53 V. Chi-square Tests of Score Independence of Group Classification When Groups are Combined Two at a Time.......................................

56

LIST OF FIGURES

FIGURE

PAGE

Fig.

1. Instrument in U s e ................. following

page 17

Fig.

2. Tri-dimensional Pursuit Testlearning Curves........... ...... following

page 41

Fig. 3. Percentage of Each Group in Selected Range of Combined Total Error Score Distribution.......

53

Fig. 4. Side View of Instrument (Panels Removed).... following page 70 Fig.

5. Side View of Housing unit (Panel

Fig.

6. Main Shaft Assembly..........

72

Fig.

7. Top View

74

of Housing U nit....................

Fig. 8. Power Supply Fig.

9. Airplane

Fig. 10.

Airplane

Removed)... following page 70

.........

follovang page 74

Controls I . Controls II................

...

79 79

CHAPTER I INTRODUCTION

In a modern war it is necessary to develop the largest number of skilled pilots in the shortest length of time pos­ sible.

If the supply of airplane pilots is limited, the

air forces, in developing an adequate fighting unit, would at the beginning of an emergency train as many applicants as possible with the facilities available.

The most effec­

tive air force would be made up of the most efficient pilots. The major problem in developing the most efficient fly­ ing personnel is either the initial selection of the poten­ tially best qualified pilots or the development of the most effective training program.

Thus far, there is little pub­

lished scientific information concerning the relative im­ portance of these two related problems.

If their signifi­

cance could be established, it would greatly facilitate the concentration of effort in the most promising areas.

Psy­

chological research units within the air forces have been established to investigate the factors involved in success­ ful flying, which may lead to either effective methods of selection or the development of more effective training pro­ grams.

There has been some opportunity for professional

civilian consultation to aid in these efforts, the psychol­ ogists as a group having been requested to contribute chief-

2

ly to the problem of selection* Bven though the relative importance of initial apti­ tude or methods of training in the establishment of differ­ ences in flying ability is unknown, there is no question that great individual differences in flying performance do exist.

In order to test the hypothesis that the differences

are due to variation in certain type of aptitude,

one test

or a battery of aptitude tests of that type must be devel­ oped and evaluated in a control experiment as a crucial test. Whatever the measure chosen happens to be, it should prefer­ ably be both qualitatively and logically related to the ac­ tual piloting situation.

The validity of such an aptitude

test will depend upon its being positively related to suc­ cess in actual flying skills. Any task as complex as piloting will involve all as­ pects of behavior, which may be conveniently classified into intellectual, affective, sensory, and motor psychological processes (22)*.

Specialists in each of these behavioral

aspects have been asked to study piloting task.

its relationship to the

The specific effort in the present study

was to note the predominantly psycho-motor elements which are related to pilot success and to design and construct an aptitude test which might aid in the selection of potentially superior pilot candidates. *Indicates the alphabetical listing in the bibliography.

3 The vast majority of motor skills tests have been found to he rather specific (21) and usually correlate only slight­ ly with any criteria of complex behavior.

It is possible

that psycho-motor tests will also prove insignificant in pilot success.

However, since the other studies have been

mainly on the simpler tests and abilities, a complex test might prove more significant. have been investigated.

Not many complex motor tests

Thus, since it appears that pilot­

ing consists of a complex integration of psycho-motor adjust­ ments, an instrument designed specifically to simulate these conditions may successfully discriminate between the poten­ tially good and potentially poor pilots. It should be emphasized that even if negative results were obtained in such study, results would still be of con­ siderable practical significance, since the number of possible determinants of flying skill would have been reduced by the investigation.

If preliminary results should support the

psycho-motor aptitude hypothesis, further developments of such a test should be undertaken immediately on a large scale. Any reduction in the number of potentially poor pilots now being trained would mean a tremendous saving in man power, training time, and expense for the Army Air Porces.

When

clear cut evidence is obtained with respect to the probable significance of aptitude tests, the air forces may then em­ phasize either selection techniques or training improvements,

4 to get the greatest number of the most highly skilled p i ­ lots in the air in the shortest period of time. In order to identify which new aviation cadets are po­ tentially successful pilots and which could more profitably be classified in some non-piloting area of the Air Forces, it is necessary to estimate either the c a n d i d a t e s ability to perform the duties of the piloting task or the probabil­ ities of his developing said abilities, i.e., his potenti­ alities for acquiring such skills. There are two methods for determining whether or not a recruit has the necessary qualifications for any task. most conclusive method is a direct try-out.

The

This is most

practicable when the training period for the task is short and inexpensive, and when the try-out of the actual task is neither hazardous to life nor expensive in equipment.

Obviously the

direct method is an impractical one for selecting airplane pilots.

The second technique for predicting probable success

in a task is the use of some one or more diagnostic instru­ ments for measuring the aptitudes of individuals or their probable success after training.

Such a method is more feas­

ible here. An estimate of an individual’s aptitude for piloting may be made by noting the degree of skill on a test, success upon which may be shown to be related to the individual’s ultimate capacity for piloting.

The concepts involved in

5 aptitude testing, some of which are useful here, have been loosely used in the literature.

Operational definitions

for the three terms, capacity, skill, and aptitude have been included to eliminate any misleading concepts of their use here.* Capacity; - A person’s functional capacity for a given performance or skill is his maximal, p o ­ tential effectiveness in terms of end results, (i.e., speed, precision, strength, qualitative characteristic or a combination thereof) which may be achieved by using a given work method with maximal overlapping of component actions after optimal training. It further connotes that; (1) the functional limit is based in turn on anatomical and physiological constants of separate organs involved in the given work meth­ od and on their integration through neural and humoral systems; (2) conversely, change to a different set of organs would result in a dif­ ferent capacity; (3) a change in work methods alone, utilizing the same organs, could still change the capacity; (4) the capacity is rela­ tively stable for each work method, this being usually attributed in large part to inheritance of the anatomical and physiological character­ istics which are involved; (5) while minor var­ iations arise due to differences in age, health, and motivation, the individual tends to retain his relative ranking among others subjected to the same working conditions; (6) optimal train­ ing includes both direct and transferred train­ ing under expert supervision; (7) adequate tools, materials, and working conditions are assumed; (8) since capacity refers to a potential limit, it can only be inferred by extrapolation from an individual’s performance at any stage of the particular learning curve which is characteristic

* The operational definitions of these concepts were de­ rived by a class in Special Abilities at Northwestern Uni versity under the direction of Ur. H. H. Seashore.

6

for a given person exhibiting his initial rate of progress and which is probably asymptotic to a theoretical ultimate level; and (9) initial rate of progress is thereby assumed to be sig­ nificantly and positively correlated with ulti­ mate capacity in any performance in which the units of measurement are sufficiently fine to discriminate actual differences in effective­ ness at the more difficult stages of performance. .Skill - A p e r s o n fs skill in a given performance is his present effectiveness in terms of end re­ sults, e.g., speed, precision, strength, quali­ tative characteristic or a combination thereof. It is further connoted that this degree of skill is dependent upon the particular work method em­ ployed, including the extent of overlapping of component actions. It may also be connoted that of two persons attaining the same end results, the one who does so at a lower energy cost is said to be more skillful. Aptitude - An individual's aptitude for a given performance is his probable rate of learning a skill, or ease in a skill, or both as estimated from sample related factors, e.g., (1) favor­ able structures or physical constants of organs; (2) transfer of training as shown in adoption of favorable work methods (a) general methods of approach to a problem for developing new methods appropriate to a specific activity, and (b) carry over of appropriate specific methods from previous similar activities. It is fur­ ther connoted that: (3) such aptitudes are or­ dinarily quite stable, but may change as a re­ sult of intensive training; (4) rapidity of learning is positively correlated with high ul­ timate capacity; (5) high aptitude leads to ease in terms of low energy cost; (6) interest and satisfaction in exercise of potential ability is easily developed; (7) aptitudes are relative­ ly specific, or at most, are related only within small groups.

7 The determination of the nature of an aptitude teat for piloting requires an adequate description of the oper­ ational duties involved in flying and piloting an airplane. These may be obtained from direct observations of the pi­ loting task and consultation with experienced pilots and flight instructors.

The qualifications necessary for suc­

cessful piloting may be estimated from these Job descrip­ tions. Upon examination the piloting function involves among other things the continuously coordinated motor adjustments of hands and feet to sensory cues, largely visual, to con­ trol a moving airplane in three-dimensional space. Both gross activity, minute operations, and the partic­ ular relationships or patterns existing between these minute operations were considered in the description of the pilot­ ing processes. A detailed discussion of the history of aptitude tests which are significantly related to pilot selection as well as the relationship of this test to them will have to be deferred until after the current war.

Military restrictions

on reporting of research make it at present impossible to make comparisons of the aptitude tests which have proved most successful.

We shall be limited here to examination

of the principal kinds of motor testsknovm to be available, emphasizing the ones which have been standardized and re­

8 ported as useful pilot selection instruments, and the prin­ cipal practical or theoretical advantages of those which seem to he applicable to the development of a new aptitude test. There has been little psychological research in pilot selection since the last war and up until shortly before our entry into the present war.

Test construction for fly­

ing aptitude in general has been hampered by crude criter­ ia.

Only a few studies have been performed upon any appre­

ciable numbers of pilots in experimental groups and whose degree of success in piloting has been known accurately. The chief criterion has been groups of successful pilots, based upon graduation from a pilot training schools vs. unsuccessful pilots, those rejected by the training unit (9, 10).

A few studies have used flight ratings, usually

by the flight instructors, as criteria for flight success (7,8,10, 19, 26).

Validations of psychological tests in

aviation have in general been unsatisfactory (17, 26) since the studies have been based on too few cases and inadequate control.

It has been recognized that few motor tests have

shown positive relationship to flight ability. The motor tests available may be conveniently classi­ fied according to the type of motor response emphasized in the task.

Speed tests may involve either simple or choice

reactions.

Serial responses may be either primarily dis-

9 criminative or continuous in nature,

precision responses

may be either static or dynamic, . , and if dynamic, at high or low speed.

Where gross motor skills are considered,

manual or general strength tests may be utilized. Parsons (19) tested aviators selected on the basis of instructors ratings, including 25 aviators of marked abil­ ity; 40 aviators of average ability; and 11 unsuccessful pilots, and found that simple visual, auditory and tactual reaction time were of no use in pilot selection,

The study

was an effort to discover non-physical standards for se­ lecting naval aviators.

Positive results were found in tests

for “emotional composure" (responses to startle, etc.) and elimination of fear.

The only psychomotor factors investi­

gated were the above mentioned reaction times and all proved of little value.

The O fRourke Complex Coordinator was an

early instrument used to test the relation of complex re­ action time to flight ability.

The apparatus consisted of

an adjustable seat and a set of airplane controls mounted on a frame in the same relationship as those found in an airplane. On an upright panel in front of the controls a buzzer and a series of red, white, and green lights were mounted. The lights furnished visual stimuli to which the operator responded by manipulating the airplane rudder and stick. The S responded to all stimuli by the movements of one or

10 both controls.

The test consisted of sixty-two discrete

reactions, and it required about fifteen minutes to admin­ ister a run.

The time to complete the task and the errors

made were graphically recorded.

Thirteen hundred and ninety-

four new aviation cadets were tested prior to entering the primary training course on the complex coordinator.

Mash-

burn (16) has pointed out that critical scores based on com­ plex reaction time obtained on this apparatus could appre­ ciably differentiate between successful and unsuccessful pilots.

Reid’s Reaction Time Apparatus which is somewhat

similar to the above has been used.

Colored lights arranged

centrally around the speed indicator of an airplane cockpit control panel furnish cues of the position of the airplane in space to the subject; i.e., whether flying straight, to port,

or to starboard, whether banked or on a level keel.

The task of the subject is the time required to return the signal lights back to the point indicating level flight. Pifty-six cases were used and were divided into three groups; those currently engaged in flying; those having received training but not currently engaged in flying; and a group of non-trained subjects.

On the basis of these few cases,

a slight positive relationship to pilot success was found. Since the criterion groups were not equivalent in training, the advantage to the first group may have been a result of transfer of training.

11 Henmon (10} tested 300 flying cadets and flight in­ structors who had teen rated as very good flyers, very poor flyers or flyers of unknown ability. cluded 10 tests.

His battery in­

simple visual and auditory reaction time

and choice reaction time were unrelated to flying ability. Swaying with eyes closed correlated •££ with the criterion using Shephard's R.

These and other studies seemed to in­

dicate simple reaction time has little if any predictive value for flight ability.

As yet a conclusive estimate

cannot be made concerning discriminative reaction times. The Mashburn Automatic Serial Action Apparatus (14), an outgrowth of the complex coordinators mentioned above has been standardized and found to be somewhat related to flight success.

It controls the standard of accuracy m e ­

chanically and measures time.

This test was designed to

present a continuous series of stimuli to the subject.

It

consists of regular airplane controls and an upright panel with three series of stimuli lights.

Each correct series

of responses of the controls sets up the succeeding signal until that series is completed.

The instrument was used in

testing 1466 new aviation cadets upon their entrance into the primary course of the Army Air Forces.

The subjects

were followed through to the completion of the primary course and then critical scores were set at various levels to

12 determine whether or not appreciable separations between successful pilots and those cadets who had washed out for "failing to meet standards of flying required" could be based on test performance.

Glenn (9) has shown in this val­

idation that it is possible to improve expectancy of primary course graduates more than 20$.

It is interesting to note

that the percentage of cadets who washed out while they were still in dual training is related even to a greater degree to test performance than those who washed out during solo flight.

It is pointed out that on the re-testing of sever­

al classes of students learning was shown on the apparatus, but its significance is not published, although probably known by this time. The relationship of serial discriminative ability to pilot success is being further tested by Seashore Serial Diserimeter* at the present time.

This instrument measures

only finger responses on four keys to four different number signals. Flack and Bowdler (8) investigated 1000 successfully accepted candidates for flying, 93 successful pilots and students, 200 accepted unsuccessful candidates and 40 stud­ ents who had failed in flight training.

The test included

balancing on one foot with eyes closed; holding hands out­ stretched; balancing a rod on a board; and knee jerk. * Personal communication from the experimenter

13 Twenty per cent of the washouts had a marked tremor as com­ pared to only two per cent of the successfully accepted can­ didates-

Postural sway has been used in selective batteries

but there has been no reliable results published.

Seashore

has included new models of the Arm-Hand Sway Meter and the photo-electric hand steadiness test in his psychomotor bat­ tery now under the observation of the air forces.* Pursuit tests require motor adjustments to a continuous changing stimulusaas contrasted to individual adjustments to discrete stimuli in serial reaction.

Eye movements were

photographed while the 3 watched a swinging pendulum, the Miles Pursuit Pendulum, and were found to be correlated .40 with the progress of 26 men learning to fly (18).

The

Eoerth Pursuit roter (12) has been well standardized and unpublished evidence indicates that it, too, may be positive­ ly related to pilot success.

A late revision of Hull's (11)

Engine lathe Aptitude Test which involves the manipulation of two hand cranks has substituted a moving target pattern for the originally stationary stimulus.

In its present form

this is now called the two-hand coordinator.

There are un­

published indications that this test also is related to fly­ ing ability.

It has been suggested that superior strength

and general athletic ability tend to be correlated with

* Personal Communication from the experimenter.

14 piloting success; however, Stratton audios coworkers (24) in some studies involving grip strength as a test of muscular exertion and sustained grip as a test of endurance have in­ dicated that strength or endurance may not he realted to fly­ ing ability. cadets.

They examined a total of 122 army air corps

Eatings of these men were based upon estimates by

instructors for each flyer for dual and solo flight and week.

each day

J’rom the above review of tests those which seem

to be most closely related to pilot success involve contin­ uous coordinated movements. The pursuit tests based upon continuous movements have as yet certain types of defects which could be eliminated. The stimulus aspect of the test might more nearly simulate actual flying conditions and the scoring devices might lend themselves to the recording of degrees of error. Any improved aptitude test for pilot selection would probably combine the most promising principles of the tests mentioned above and eliminate their obvious shortcomings. Since there is some unpublished evidence that the pursuit tests and the complex tasks are more predictive than others, a complex pursuit test simulating the actual job description of piloting as nearly as possible may provide a more diagnos­ tic instrument. In the light of the requirements for successful pilot­ ing and a review of the objectionable features of available

15 tests some specific improvements were attempted in the de­ sign and construction of the new test.

One desirable feature

was the simulation of the components of flying activity in so far as it was practicable.

It might include a miniature

airplane as a stimulus which would move continuously in the three planes of space in conventional flight maneuvers— a pursuit coordination test in contrast to a serial discrim­ ination or complex reaction test.

Visual perceptual cues

should serve as the chief stimulus for adaptive responses.* An objectionable characteristic of some of the current­ ly used coordination tests is the demand for response to multifocal cues,

since the flight condition necessitates

simultaneous perception of directional movement in three planes.

Unifocal visual cues should be approximated in so

far as possible.

If the control panels simulate airplane

cockpit controls, the corrective adjustments should avoid "cross controls,” that is, a banking movement in a direction opposed to a turning movement.

The controls should be of

such a nature as to require motor adjustments of the same body movements used in piloting a plane, i.e., eye, hand,

* To include kinesthetic cues would necessitate movement of the whole body of the observer which would have involved instrumental complexity. A chair designed to operate on the exact principles of the newly designed tri-dimensional pursuit test is now nearing completion. In this test the predominant cues will be tactual, kinesthetic, and equilibratory, either with or without the addition of vision.

16 feet coordination.

The speed at which adjusted movements

are required should be comparable with the average flight adjustments under normal flight conditions.

In order to

investigate the effect of transfer from pilot training to such a task, alternate sets of control panels should be con­ structed;

one designed specifically to avoid excess positive

transfer from flight training, the other to simulate the air­ plane cockpit controls. Many motor skills tests have been scored in terms of an all-or-none principle.

The scoring of an improved test should

be in terms of degree of proficiency instead of the toocfrequently found succeed-or-fail type of scoring.

The size of

units of measurement should be chosen after the probable range of errors and the degree of accuracy maintained by well trained Ss is known.

In the light of these considera­

tions the present instrument was designed to constitute a multimember controlled pursuit task which moves in three planes.

CHAPTER II APPARATUS AND PROCEDURE

Description of the Apparatus The S watches a miniature airplane which moves contin­ uously in three planes of motion (see Pig. 1).

His task

is to keep the airplane in forward level flight by compen­ sating for the deviations from level flight through the man­ ipulation of three controls.

The planes of motion together

with their respective controls are; (1) horizontal turning— left hand lever;

(2) elevation— right foot treadles; and (3)

banking— right hand lever.*

The score is in terms of three

degrees of error on either side of the position for level flight for each plane of motion.

A trial is 114 sec. in

length, an interval which happened to be approximately the desired length of time, for two repetitions of a fixed pat­ tern of complicated maneuvers.

Pive trials make up the test

ing period, with 30 sec. rest between trials.

* Another control panel employing the standard type of air­ plane stick and rudder has been constructed but was not em­ ployed in the present work because of the expected high transfer from flying experience. A detailed description of this second control panel is found in Appendix A.

,vV;

Fig.

1. Instrument in Use

18 Instructions to S: Pleas© be seated and adjust the chair until your right foot rests comfortably upon the foot treadle. Place your right and left hands upon the knobs of the right and left levers before you. You will note that you can set the plane upon a course flying directly straight away from you by r o ­ tating the left lever. The wing banks may be correct­ ed by lowering or raising the right lever. You can level the nose of the airplane by either depressing or relaxing your foot on the foot treadle. At each ready signal - set the plane in straight level flight. Throughout each trial make adjustive movements of the levers and the foot treadle so as to maintain the starting position. Your score is deter­ mined by how well you retain straight level flight. Your score is in terms of errors, and when you are off the course, you can hear electric counters clicking off points against you. Yrfhen the plane is level, you will hear no clicks. Belax completely after each trial.

ReadyJ

19 Preliminary Tryout and ffinal Adjustment College subjects with and without pilot training were used in the preliminary investigation to determine the final adjustments in the apparatus needed and the details of ad­ ministration.

Since a pretesting trial of 30 sec. duration

seemed not to effect appreciably the total score on five or more trials, no pre-practice was included in the final test­ ing.

Since the practice effect had begun to level off by

the time of the fifth trial, the testing period was limited to that number of trials. Two revolutions of the cam'mechanism required 114 sec. or approximately 2 minutes.

Since this period appeared long

enough to minimize the effect of chance errors and fatigue and short enough to produce maximum motivation, this unit was selected as a standard time for a single trial. A demonstration of how the control levers operated the airplane with the apparatus at rest was found to facilitate the understanding of instructions and was followed by the subject’s own demonstration as a check.

Adjustment of the

chair for a comfortable reach of the controls was found to be a pre-testing necessity.

Besting of the arms on the con­

trol bench of the independent controls was prohibited. Recording sheets were designed to include spaces for each of the three counter readings for each cam revolution.

20 A single trial was two cam revolutions, lout the error scores for each separate cam revolution were easy enough to obtain and were recorded for purposes of odd-even reliability esti­ mates*

Space for a summative score for the three planes of

motion was also provided. In the preliminary administration, students with flight experience found the test both interesting and markedly sim­ ilar to flight experience with the exception that the ma­ neuvers changed directions more rapidly than were usually found under ordinary flight conditions.

The movement of

the Airplane Controls in correcting the miniature airplane's maneuvers were conventional.

The relationship between the

Independent Controls and the corrective movements of the airplane appeared to produce the desired simplicity.

CHAPTER III SETTING- OP THE EXPERIMENT

The evaluation of the instrument was performed on four different groups of subjects.

Bata were gathered on two

classifications of Army Air Force personnel,* Civilian p i ­ lot Training Secondary Students, and a group of non-flying soldiers.**

The characteristics of each group are given

below: 1. New Aviation Cadets (NACs).

This group included

sixty-two new aviation cadets who had met the minimum men­ tal, physical, and education requirements for the Army Air Forces.

High school graduation was the basic educational

requirement at that time. 2. Washed Out Pilots (WOPs).

A group of forty-three

Army Air Forces Aviation Cadets who had been excluded from piloting makes up this classification.

These individuals

had met the minimum mental, physical, and educational * These groups were tested in the Psycho-motor Research Unit program of the Air Pilot Replacement Center AAF, Eelly Field, Texas, April 25 to May 1, 1942, under the supervision of Major Arthur W. Melton and Major Robert T. Rock during the writer's appointment as Civilian Consultant to the Army Air Corps, Re­ search Bivision, School of Aviation Medicine, Randolph Field, Texas, for the demonstration and preliminary validation of the newly constructed tri-dimensional pursuit test. ** A group of two officers and fourteen enlisted men from an antiaircraft battery of the Coast Artillery stationed at Ft. Sheridan, Illinois, who had volunteered to participate in a research project were tested on May 12, 1942.

22

requirements of the Army Air forces upon their entrance* No information was available as to whether mental tests em­ ployed were the same as those now used for NACs, nor was in­ formation available concerning the possible difference be­ tween these two groups in respect to physical condition. The basic educational requirement was probably higher than that required of the present NACs when most of this group entered the Air Porees.

Each member of this group has had

some flying experience but the amount is not the same for each.

Disqualification as a pilot may occur at any time

during pilot training.

Of thirty-five men reporting number

of flying hours, the range was from eight to one hundred and ten, with a mean of twenty-nine hours.

Numerous reasons

were given for disqualifying as a pilot.

Some of the most

frequent reasons given were poor motor coordination, danger­ ous flyer, and slow learning together with "cracA-up" and illness. 3.

Civilian Pilot Training Secondary Students (CPTs).

The class of Northwestern University’s Civilian Pilot Train­ ing Secondary Course (second semester, 1942) made up this classification.

This included fourteen male students, all

of whom have met the basic mental and physical requirements necessary to qualify for a CPT program. had completed the primary CPT course.

Each individual Of the tvt/elve sub­

jects reporting their approximate number of flying hours,

23 all had had approximately as many dual instructional hours as solo hours.

The mean solo time was forty-six hours.

All

members of this group are considered successful pilots since they have successfully completed the primary course and b e ­ cause of the high percentage of previous students from this training unit who have later succeeded as pilots in the air forces.

Irregularities occurring in this group are given

in the footnote.* 4.

Antiaircraft Soldiers (AASs).

A group of seventeen

members of an antiaircraft battery who constitute a fair cross section of regular army personnel with two second lieu­ tenants and fifteen enlisted men (selectees).

Bach had been

subjected to army physical conditioning for at least three months but not more than two years prior to testing.

None

had had flying experience.

* Two students who were taking the ground school part of the secondary course only were included but each had proved his flying ability by successfully completing the primary course. One student had only limited experience in piloting but was recommended by the instructor as a potentially super­ ior pilot and was permitted to take the secondary CPT course and hence was used as a subject. In two of these irregular cases the standard test procedure was modified to the extent of including a 30 sec. pre-practice period. Since additional cases showed this small pre-praetiee effect to be relatively ineffective, these two cases were not disqualified.

24

CHAPTER IV CRITICAL ASPECTS OF APTITUDE TEST EVALUATION

For a test to be useful In selection, individual dif­ ferences exhibited in it must be such that the instrument reliably measures the ability of the members in groups tested.

These differences must be related to pilot suc­

cess for the instrument to be valid.

Once the significance

of individual differences is established, the test can then be evaluated as an aid in selection. Qualitative Analysis of Individual Differences Seashore (21) has pointed out that individual differ­ ences in performance of any motor shill are influenced by the nature and interrelations of such underlying factors as biological capacity, specific training, transfer and work methods.

It is important to note therefore, in the evalu­

ation of a test, whether any one or all of these factors might systematically influence test performance and conse­ quently, test reliability or validity. The importance of anatomical or physiological limits of influencing a pursuit test like the one studied here is as yet undetermined. Improvement with practice,

or specific training on a

task, has been noted in many motor "shills.

Yet, the

25 stability of individual ranks on different motor skills tests which show varying degrees of practice effect is rel­ atively great.

Fairly high test reliability, estimated by

correlating first and last trial performance, has been re­ ported in the case of simple and serial reaction by Farns­ worth,

Seashore, and Tinker (5); in steadiness, by Belton,

Blair, and Humphreys (1); in a learning study in motor rhythm by Seashore (23); and in tapping, spool packing, and pursuit test by Buxton and Humphreys (3). In considering the reliability of motor test scores, it is advisable to consider the extent to which various tests show a practice effect. effects,

If there are large practice

one result would be an increase in the total range

of scores which,

other factors being equal, would produce a

higher reliability coefficient.

Another result might be,

however, the introduction of various other factors such as motivation, insight, transfer, etc., the importance of which might tend to be concentrated at different stages of the learning curve and the occurrence of which might have quite unrelated expectancies in the same individual.

Wherever

reliability measures compare one stage of learning with an­ other stage of learning, and particularly very early and very late stages, it would thus be expected that these un­ controlled variables would reduce the measures of reliabil­ ity.

If, however, each of the extraneous factors is reason-

26 ably consistent for one or a few stages of learning, as is usually expected, these effects would be evenly distrib­ uted for odd- and even-scores for those particular stages and the reliability thus estimated would be expected to be considerably higher. Another specific factor which might logically be ex­ pected to influence individual differences oh an aptitude test is the transfer of training.*

Cox (4) has observed

that a given amount of time devoted to the acquisition of of skill in a factory assembly line produced little or no transfer to other industrial skills, and Seashore (21) re­ ported that transfer of training is usually found to be in­ effective in motor skills, especially for relatively short experiments with adult 3s.

It should be pointed out here

that most laboratory experiments on transfer involve rela­ tively slight amounts of training.

For much longer train­

ing periods, total transfer may be greater. Bryan and Harter (2), and more recently the writer (27), have pointed out the relation of work methods to learning.

* Measured amounts of transfer usually represent the alge­ braic sum of the positive and negative components of trans­ fer of training (25) and it should be recognized that ref­ erence to positive or negative transfer in this study merely signifies that there is balance or net effect of the combined transfer either in a favorable or unfavorable direction from qualitatively similar activities.

27 Seashore (24, 21) has summarized the importance of work methods as an underlying factor of individual differences in all aspects of behavior.

Superior performance in a task

may he due to the initial application or the later adoption of superior work methods.

Should an aptitude test be of

such a nature that the application of new work methods would cause striking irregularities in an ind i v i d u a l s learning curve, the reliability of the test would be lowered.

Quantitative Analysis of Individual Differences Securing a reliable test.

To be reliable, a test

must consistently measure the relative abilities of the mem­ bers in a group.

It is expressed in terms of the coefficient

of correlation between any two performances and may be esti­ mated by several different methods.

All methods, however,

depend directly upon the size and stability of the individual differences from sample to sample of behavior. The evidence that specific training influences individ­ ual differences in complex tasks suggests that in a reliable predictive device, individual stability must appear through­ out various stages of learning.

Since the tri-dimensional

pursuit test is complex and probably subject to practice effects, a thorough estimate of reliability necessitates the study of individual stability from trial to trial, as well

28 as from early to late trials.

The particular estimates of

"cross-sectional" reliability (stability over a limited portion of the testing period) reported here were made by correlating performance on adjacent trials, while an esti­ mate of "longitudinal" stability was secured by correlat­ ing the first and last trials.

To these estimates a third

one, an estimate of stability of the entire test, was made by correlating the average score made on the first half of each of the five trials with the average score made on the second half of each of the trials.* Securing a valid test.

If success on a job is to be

predicted from a test score, it is essential in the test’s development to determine that it measures something import­ ant in job success.

This involves some comparison between

test score and proficiency on the job as represented in the criteria (28).

A test is a valid selective instrument when

this comparison indicates a satisfactory relationship be­ tween test performance and proficiency on the job. Yarious procedures for obtaining this relationship may be used.

The method selected for validating any particular

* This was done by recording the error score at the comple­ tion of each half of each trial. A trial consisted of two earn revolutions, or two repetitions of the same pattern— • explained in details of apparatus design, Appendix A. A l ­ though the instrument was stopped only after each second cam revolution, it is relatively easy to record the errors for each plane of motion at any given cam point.

29 test is determined by the nature of the test score and the measures of job proficiency available.

The comparison of

the mean test scores of groups of equal experience at var­ ious levels of job proficiency, i.e., poor, average, and superior, is one common method of validation.

Some rela­

tionship between the measuring instrument and success is assumed, icant.

if the difference between group

means is signif­

Frequently, this method is applied to only the groups

representing the extremes of proficiency, as a quick method of determining possible relationship between test perform­ ance and the criterion.

This method is particularly useful

when testing time and the number of cases are limited. It. is assumed that if the extremes of proficiency levels cannot be discriminated by means of test score, then the intermediate levels could not be separated.

It should be pointed out

that a representative sample of the entire range of profic­ iency should be included in any final standardization.

This

is necessary to determine the regression equations for pre­ dictive purposes;

otherwise, a curvilinear relationship

might exist without being noticed in a simple analysis of significant differences between means of extreme groups. When criterion groups of equal training are not avail­ able, test validity may be estimated by examining the dif­ ferences in performance of unequally trained groups.

This

comparison may even include differentiating one group prior

so to training from another group after training on the job, This method has the inherent limitation that the differ­ ences obtained may be attributed to transfer of training from the job to the test unless transfer has been proved ineffective.

If transfer appears to be an important fac­

tor underlying individual differences in the task, the dif­ ficulty of estimating the amount of difference which should be attributed to transfer and the amount which should be attributed to some other factor must be recognized.

Thus,

this second method of validation is less critical than the first.

Even if such criterion groups can be separated, the

results merely indicate that there is probably some relation­ ship, but the amount can only be inferred from other evidence. Final validation must await upon either comparison of groups representing equality of training as well as the extremes of proficiency,

or preferably upon direct correlational anal­

ysis of test score and proficiency rating after optimal train­ ing of the entire usual distribution of talent. Another method useful in validation is the comparison of percentages of individuals in criterion groups who fall above or below some critical test score.

This method is

useful in showing the degree of effectiveness of predictions from some restricted range of total test scores. When conditions permit,

one of the best methods of

validation is the correlation of the trained i n dividuals

31 standing on the test with his proficiency rating on the job, where all have had equal training.

If a representative

sample of individuals is drawn from the entire range of pro­ ficiency, this method provides a good estimate of the nature and extent of the relationship which exists between the var­ iables, and the usefulness of the test as a predictive in­ strument.

If such a criterion is not available, another m od­

ification of this method is to correlate test performance of untrained subjects with proficiency rating of the same indi­ viduals after optimal training.

This method is the one which

most nearly resembles the way in which the scores of a valid test would actually be used to predict success after training. When the direct correlational technique is impracticable, it is sometimes possible to infer the relationship between the test performance and the criterion by correlating test performance with some other test of known validity.

This

method is not always good since a high correlation with a perfectly valid test would indicate that both are measuring the same thing and if the new test has no advantage such as ease of administration, simplicity, or economy, no progress would have been made.

If the criterion test is not perfect­

ly valid but is correlated with job success, a positive r e ­ lationship between it and the new test may occur for several reasons, e.g., both may be measuring the same qualities in­ volved in the aptitude.

Which one of the tests is more

32 nearly related to job success must be determined by a di­ rect correlational analysis with job success. relationship is not diagnostic at all.

A negative

At best, this indi­

rect method is an index that there is some relationship be­ tween the new test and some other measure which is positive­ ly related to the criterion.

The limitations of this method

have been exposed frequently in the poor validation of many personality tests.

It should be used only when the relation­

ship between the measuring instrument and the criterion can­ not be estimated more directly or when it is desired to dis­ cover the grouping of abilities among test variables them­ selves, as in factorial analysis of human abilities. In the absence of criterion groups showing a continu­ ous distribution of proficiency, an estimate of the probabil­ ity that the test performance is related to job success may be made by using the chi-square test of independence.

This

technique has the limitations of showing neither the degree nor direction of the relationship but reveals only the proba^ bility that a relationship does exist. All the methods of validation mentioned above use some static measure of test performance, either a given trial average, the average for a group of trials, or in the case of the chi-square test of independence a total score.

It

is possible that in the case of an aptitude test which shows a consistent and marked degree of learning some im-

33 provement score may be more significantly related to the criterion than initial, or final, average or some other static measure of test performance. The methods of estimating the validity of the tri­ dimensional pursuit test are based upon the criterion groups available.

At the time the apparatus was loaned to the Army

Air Force®

the exigencies of the testing program in their

research unit permitted try-outs of only two groups of Air Force personnel.

One group consisted of 1/OPs after training,

the other group was made up of NACs prior to training.

In

addition to these, two other criterion groups later became available.

The first was made up of students in the second­

ary course of the CPT program at Northwestern University and the other was a group of soldiers who were members of an anti aircraft battery in training at Ft. Sheridan, Illinois.

The

particular characteristics of these groups have been given above under subjects. Before attempting specific group separations on the basis of the test, an over-all comparison of the total test performance of the various groups was made.

The average per­

formance on each trial was calculated for each group studied and will be graphically presented in Figure 2.

This provides

a comparison of the performance of the groups at different stages of practice.

In addition to the group means, the

average score on each trial for the upper, and lower, quar-

34 ters of each group (quarters determined on the basis of initial status) was calculated and will be included in Pig. S. Successful and unsuccessful criterion groups of pilots are available in the case of CPTs and WOPs.

The first at­

tempted estimate of validity involved testing the reliabil­ ity of the differences between the mean performances of these two groups, PisherTs t test of significance being used.*

These positive and negative criterion groups are the

closest approximation to the best method of validation by group separation that is available.

It should be noted that

the number of cases and the amounts of training are not equivalent for the two groups and that the degree of pro-

*The t test is essentially a critical ratio for estimating the significance of the difference between means. Since the S.P. for small samples tends to be smaller than that for the population, a modified and more conservative formula is need ed. When the two samples are related, this formula is: t = IvI° ~ %

/"ipEHI n(n - 1}

V

in which d is the deviation from the mean of the differences. The obtained t value is interpreted in terms of n-1 degrees of freedom (l3;Chap.3). In the probability table for t the smaller the number of degrees of freedom, the larger the val­ ue of t required for any given level of confidence. If t oc­ curs aT a 5°/o level of probability, it is regarded by FisITer (6) as "significant” and if it falls at the V$> level, it is regarded as "highly significant."

35 ficiency probably varies considerably within each.

If the

test is fairly reliable and small groups agree with the gen­ eral tendencies of the larger ones, the objection to small and unequal groups is less serious.

If significant differ­

ences are found in spite of such limitations, the test cer­ tainly has enough validity to warrant further investigation. The same test of probability that true differences exist between the group means was applied to the comparison of each criterion group with one another. The comparison of a negative (WOP) or positive (CPT) group after training with an unselected group (NACs or Soldiers) before training has the inherent difficulty of indeterminate transfer influence, even if all other con­ ditions were ideal.

Thus, where trained groups are com­

pared with the untrained ones, the indirect method of exam­ ining the evidence for the influence of the underlying fac­ tor of transfer is the only analytical method feasible. The per cent of each group falling below the median of the combined distributions of total error scores was cal­ culated and will be presented in Table IT and Pig. 3.

The

percent of each of the groups falling in the upper and lower quarter of this combined distribution will be included also. No test of significance has been applied but the method indi­ cates effectiveness of prediction from a restricted range of scores.

Total score was selected as the basis for percentages

as probably the most stable index of test performance.

36 It has been pointed out that there is no method of de­ termining precisely the degree to which test performance is related to pilot success (with the data here available). However, the chi-square test of independence was used to de­ termine whether performance on the tri-dimehsional pursuit test is unrelated to membership in a successful or unsuccess ful pilot group.*

The chi-square test involved the combina­

tion of test scores for two criterion groups and the separa­ tion of this combined distribution at its median.

If test

performance is independent of classification, then equal proportions of each group (except for chance discrepancies) would fall in the upper and lower halves of the combined distribution.

In the case of an extreme quartile analysis

of the same distribution, the hypothesis tested is that equal proportions of subjects in the two success categories represented will fall in the first and fourth quarter of performance. An illustration of how the frequencies in the con­ tingency tables were obtained may help clarify the pr o­ cedure:

* The language here is the hind required for a test of the null hypothesis, viz., that the two variables are independ­ ent. One accepts or rejects this hypothesis in accordance with the size of the computed probability that the distri­ bution found could have occurred on the basis of chance.

37 SAMPLE CONTINGENCY TABLE USED IN CHI-SQUARE ANALYSIS

I WOPs CPTs

(21) 25 ( 7) 3 28

II (21) 17 ( 7) 11 28

42 14 56

x 2= (4)2* (4)2+ (4)2+ (4)2 2T T .- T 2r ~

x 2s 6.09;

P = 1.5$

The combined distribution of WOPs (42) and CPTs (14) has a total N of 56. If the group is separated at the median, 28 eases should fall in the poorer half (I) and an equal number in the better half (II). If the hypoth­ esis that test score is independent of classification holds, one half of each group should fall above and be­ low the median. These theoretically expected frequen­ cies for each cell of the contingency table are given in parentheses, and the actual frequencies are entered just beneath them. The chi-square test is the sum of the differences be­ tween the theoretical expectency and the obtained fre­ quencies divided by the theoretical frequency for each of the four cells. The P values in Eisher's table (6) of chi-square indicate what the sampling distribution of chi-square would be if the hypothesis were true, and indicate in what percentage of random samples of this same size the observed value of chi-square would be exceeded if the hypothesis were true. Eor one degree of freedom (in all of the tables in­ volved here, only one theoretical value is not deter-

38 mined by the requirements placed on row and column sums) a chi-square of 6*09 has a p value of 1*5$, or this chi-square would be exceeded in 1*5$ of random samples. Since the P value is very low, it can be said with a reciprocal degree of confidence that the hypothesis of independence is false (13;Chap.2). or, in positive terms, there probably is a significant re­ lationship between the two variables of test scores and flying success. The chi-square test was used on all possible pairs of groups available, with separations at the median for one test of in­ dependence and at the upper and lower quartiles for another. Both were calculated since extreme quartile differences may be expected to occur when the actual differences were not great enough to separate the groups in halves. Cumulative scores rather than cross-sectional scores (scores received on any given trial) were used to avoid penal izing an individual for the chance operation of extraneous or external factors, e.g., a sneeze, during a single trial. Besides showing whether there is a relation between test per­ formance and success as a pilot, this method readily gives the point in the testing period at which this relationship becomes significant, which is a clue to the desirable length of the test.

It should be pointed out again that the chi-

p

squares (x ) are only an indication that the performance is related to classification.

The direction and degree of the

relationship may be verified only through a correlational analysis when flight records are available.

However, the

39 examination of the trial scores and learning curves of Table II and Pig.

1 show which groups tend to be superior

in performance. In summary of the probable influence of the underlying factors of individual differences upon the validation of this particular aptitude test, it is evident that transfer probably has the greatest influence.

This was anticipated

since the tri-dimensional pursuit test was specifically con­ structed in part as an analogous miniature of the piloting situation.

The construction and use of the independent con­

trols* (Appendix A) was an attempt to reduce the transfer effect as much as possible. The test was designed to present a situation complex enough that shifts in work methods would not be accompanied by striking improvement score.

The biological capacity fac­

tor is as indeterminate here as in the estimates of relia­ bility.

The practice effect, or specific training within

the test situation,

is relatively unimportant if learning

trends are constant between groups. No attempt was made to include in this study the use of an improvement index for predicting success.

There are

a good many such indices but little has been done in the field of applied psychology except in terms of static meas­ ure#

The raw gain from initial status is often a useful

index, but penalizes the individuals who are initially

40 superior, in that improvement is correspondingly low, Therefore, the most useful technique is almost certain to take into consideration the relative initial position. Should the above methods of validation show quanti­ tatively significant differences on test performance, sev­ eral hypotheses in regard to the factors underlying such differences must then he tested.

Such factors to he con­

sidered would include those of biological capacity, specif­ ic training, transfer, work.methods, and general physical conditioning, the complexity of the instrument, even when simplified instructions and demonstration are included, may also give advantage to intellectually superior individ­ uals.

In comparing groups the influence of these six fac­

tors will be estimated by examining the evidence for and against them while holding constant as many other factors as possible.

CHAPTER V RESULTS AND DISCUSSION

The over-all effect of individual differences upon the tri-dimensional pursuit test may be estimated by observing the learning curves given for each group studied in Eig. 2. This analysis of group means and quartile means had to be made upon criterion groups which were available.

In the

eases of the CPTs and AASs these groups contained very small numbers, as shown in Table I.

One of the chief char­

acteristics of the learning curves, however, is their con­ sistency in form, and the means, even though based upon small samples, agree very well with the general trend and thus are not suspected to be atypical.

Examination of these

learning curves reveals that each group maintains its rela­ tive position to each other group throughout all five trials with the exception of the AASs whose rate of learning is somewhat different from that of the other groups.

The means

for the first and fourth quartiles for each group has been included together with their group means on each trial.

Al­

though the separations between the group means and between the best quartiles of the groups are not as large as in the case of the poorest quartile separations, the differences b e ­ tween two upper sets of means are consistent with those found in the lowest quartile.

The absence of significant separations

TR I- DIMENSIONAL PURSUITT T E S T LE A R N I MG CURVES-ERRORS

LEGEND NOTE- NUMBERS INDICATE OR0 0 PS TYPE OF LIN E IND IC A TES SUBGROUPS

g r I. NEW A V IA flO N

6 ups

CADETS

2 WASHED O UT PILO TS 3 C IV ILIA N PfLOT TRAINING SEjCONOART STUDENTS 4 50LDER S-

FO U R T H Q U A R T IL E

| 40

MEAN

TR IA DS

Fig. S.

Tri-dimensional Pursuit Test learning Curves

42 TABUS I AVERAGE ERRORS PER TRIAL OP THE CRITERION GROUPS ( Quartiles Based on Initial Status)

Trial Means Group

1. NAC

2. WOP

Quartile

Error Range

1

2

3

15

I

295-461

342.

237.

192.

166.

147

62

M

84-461

250.

177.

149.

129.

111,

15

IV

84-191

167.

127.

102.

97.

73,

10

I

272-414

327.

226.

188.

151.

127,

43

M

105-414

222.

160.

130.

117.

96,

N

4

5

\

3. CPT

4. AAS

10

IV

105-157

135.

111.

87.

71.

62,

4

I

216-332

272.

166.

123.

75.

49,

14

M

117-332

188.

117.

89.

54.

48,

4

IV

117-135

126.

103.

78.

46.

46,

4

I

365-536

419.

262.

165.

136.

97,

17

M

144-536

274.

191.

137.

114.

97,

4

IV

144-185

165.

117.

88.

64.

46,

43

of the best quartiles of each classification does not inval­ idate the test,

The nature of these groups are such that if

the test is positively related to pilot success, a certain percentage of the NACs and AASs, unselected for flight abil­ ity, and those of the WOPs who were excluded for reasons other than motor coordination may be potentially good pilots so far as motor coordinations are concerned. In all groups it appears that the greatest differences occur between the means of the poorest quartiles of these groups, that the curves are generally smooth throughout, and that there is a tendency to level off by the fifth trial. The CPT group, successful pilots, is by far the most successful in test performance.

The poorest quartile of

this group has an average score superior to even the group means of the other three groups by the third trial.

The

convergence of the three sub-groups of this classification was due to a limitation of this early model of the apparatus. Contact switches, described in "Details of the Design" (See Appendix A), for recording errors proved not to have fine enough units of measurement, especially for the "rotational" plane of motion.

The best one-half of this superior group

were approaching perfect scores for this plane of motion by the fourth trial.

The new switches with finer units of meas­

urement would probably separate the sub-groups throughout the entire test.

44

The effect of biological capacity is indeterminant in this study.

However, the influence of specific training in

the test performance itself is very clearly evidenced in the marked improvement in score of all groups from initial to final status.

That this practice effect probably does not

effect greatly the reliability of the instrument is shown by the consistent trends followed by the mean trial scores of each group. The general superior performance of CPTs and WOPs, groups having piloting experience as compared to untrained groups, is in line with the anticipated transfer effect from flying experience.

Since the CPTs had a median flying time greater

than that of the WOPs, some of the C P T s 1 superiority to other groups may be attributable to their extra amount of training and a resultant transfer to the test. Transfer, however, does not adequately explain the total difference between these two groups, as is noted when their performance is compared with the mean performance of the un­ trained NACs.

If NACs represent a group unselected according

to flying ability, and initial ability alone determined mean test performance, the WOPs should have a greater average er­ ror score than the NACs while that of the CPTs should be small­ er. Since there is instead a slight superiority of the WOPs over the NACs, it is inferred that this may well be due to

45 positive transfer from flight training*

In the case of the

much greater superiority of the CPTs over the WOPs it seems improbable that all of this difference may be due to trans­ fer alone*

Some other factors must be involved.

The two

trained groups are comparable in intelligence and since both have met physical requirements for flying,

general physical

conditioning probably is not the explanation of the differences. The superiority of the WOPs over the NACs is probably a com­ posite of transfer of training and initial ability on the part of those individuals excluded from piloting on the basis of non-motor factors* It is recognized that the direction and amount of the difference between the WOPs and the NACs may be due to an unrepresentative sampling of these two classifications.

The

only decisive answer to this hypothesis would be the accumu­ lation of data from much larger samples.

Contributory evi­

dence is available, however, in support of the transfer hypothesis.

Other tests where such transfer effects have not

been reported have usually been less closely related in a qualitative way to flying performance.

It is thus possible

that the tri-dimensional test would show transfer when other tests did not.

The better performance of the CPTs over the

WOPs may partially result from the additional factor of su­ perior initial abilities. There is no particular evidence from the average learning

46 curves that variations in work methods influenced test per­ formance in any unusual manner,

Further evidence that such

factors did not cause any striking irregularities is observed in the learning curves of the upper and lower quartiles of each group, as shown in Fig. 2. Reliability.

Since the two Air Force groups contained the

largest number of cases, the reliability studies were made upon them, and have been summarized in Table II.

As perform­

ance was influenced by practice, the stability of individual differences was estimated by several different methods* The correlation between the mean performances on the fourth and fifth trials vs. the mean of the second and third trials gives a median "cross sectional" reliability coeffic­ ient of .84 (uncorrected).

Yifrien the first and fifth trial

scores were correlated, a "longitudinal" estimate of relia­ bility gave coefficients of .62 and .79.

The sum of the

second and third trial scores were correlated with the sum of the fourth and fifth trial scores to give an estimate of the stability of individual differences of a larger portion of the tests.

Coefficients of .78 and .80 were obtained here.

When the average score made on the first half of each of the five trials was correlated with the score made on the second half of each of the five trials, the odd-even method of esti­ mating reliability,

correlation coefficients of .92 and .95.

47 TABLE XI RELIABILITY COEFFICIENTS

Reliability Estimate

Croup

N

r

^r

rEslfc. *

1. Cross-sectional a. Trial #4 vs. Trial #5

WOP NAC

45 62

.83 .77

.05 .05

.96 .94

b. Trial #2 vs. Trial #3

WOP NAC

43 64

.86 .85

.04 .03

.97 .97

WOP NAC

44 63

.79 .62

.06 .08

.95 .89

WOP NAC

42 62

.93 .92

.02 .02

•96 .96

WOP NAC

42 62

.80 .78

.06 .05

.91 .90

2. Longitudinal a. Trial #1 vs. Trial #5

3. Total Trials a. Odd vs. Even Halves b. Trials #2 - i£3 vs. Trials #4 - {f5

* Spearman-Brown Prophecy Formula used.

46

were obtained.

This split-halves method of estimating re­

liability is influenced by two factors.

In the first case

the dispersion of the odd-half or even-half is greater than either the first half or second half of the test, as a re­ sult of differential learning of individuals and therefore facilitates obtaining a higher reliability coefficient.

In

addition, since successive odds and evens are used the ef­ fects of motivation, practice efforts, etc., which may be concentrated at one or a few stages of learning, would not be as apparent as in the comparison of performance on ex­ treme trials. Regardless of the method of estimating reliability, the range of coefficients obtained are near that required for individual diagnosis and well above that required for group separations.

These findings are significant if it should b e ­

come necessary to shorten the testing period since they indi­ cate that individuals tend to retain their same relative rank from one trial to the nest as well as throughout the whole test.

However, in view of the thousands of dollars necessary

to train a single unsuccessful pilot, such a shortening of the test merely to save time would appear to be false economy. The Spear man-Brown prophecy formula was applied to the ob­ tained correlations in order to estimate the reliability for a full length test of five trials and these estimates are in­ cluded in Table II.

Measured by any means,

the reliability

49 of the test is high for the first standardization of a psy­ chomotor test* ients is *82.

The median of all raw correlation coeffic­ If specific training or any other factor has

lowered the reliability, its effect has not been great enough to prevent consistent ranking of individuals and thus does not interfere unduly with attempts at validation. Validity.

Since the learning curves begin to level off by

the fifty trial and the correlation between performances on trials four and five indicated fairly high stability, final status was chosen as the measure on which to apply the group separation method of estimating validity.

The t values from

which the reliability of the differences between the group means of each criterion classification were tested are given together with the actual differences between means in Table III.

The t values are used only to test the hypothesis that

there are no real differences between obtained group means at the fifth trial level of practice.

In the cases where the

hypothesis has been rejected at the one percent level of con­ fidence, the £ values are underlined.

The direction of the

advantage, readily obtained from the mean values of each group,

is indicated in Table III by starring the member of

each paired group having the smallest mean error score.

The

50 TABLE III SIGNIFICANCE OF DIFFERENCES BETWEEN GROUP MEANS BASED ON FINAL STATUS

Mean

N

r

Diff.between Means

t

Level of significance

CPT* WOP

14 42

56 96

26.4** 37.8

40

3.650

1^6

CPT* NAC

14 62

56 110

26.4** 46.9

54

4.138

156

CPT* AAS

14 17

56 97

26.4** 44.7

41

2.980

IJS

WOP* NAC

42 62

96 110

37.8 46.9

14

1.594

20$

WOP* AAS

42 17

96 97

37.8 44.7

1

.085

90$

NAC AAS*

62 17

no 97

46.9 44.7

13

1.008

40$

* Indicates superior groups of each pair. ** These S.D.s were computed prior to rejecting the unusual case mentioned in the footnote on the following page.

51 mean values have been rounded off to the nearest whole number.* This final status separation of groups clearly indi­ cates that the CPTs are significantly superior to each of the other classifications.

The greater significance found

in the separation of the CPTs and the NACs than inthe com­ parison of the CPTs and WOPs is probably due to the si ight transfer advantage of the WOPs over the NACs, as mentioned earlier.

Bad this group of unsuccessful pilots been t ested

prior to flight experience and their performance then com­ pared to the CPTs or successful pilots, it is expected that the difference would have been much more significantly in favor of the successful pilots.

The superiority of the CPTs

over the Soldiers may be due either to the positive transfer or to higher initial ability or to both. unselected according to flying,

Since the AASs are

it is improbable that the

complete difference is determined by transfer.

That the

NACs are not significantly different on final status from the

AASs is indication that the two groups have neither

the advantage of transfer or special selected abilities.

* The discrepancy between the mean values of CPTs here on trial five from that of Table I is due to estimating the fifth trial score of one S. The experimenter reported the S ’s talking on this trial7 and since he had obtained 34 and £6 errors on trials three and four, he was assigned a score of 36 instead of 77.

52 One index of how some restricted portion of the entire range of total test score* might he related to flying suc­ cess is given by the percentage in each of the criterion groups that have total scores falling within various se­ lected score ranges.

The total error score has the merit

of summating the performance of the individual for the en­ tire test.

The chance differences occurring from trial to

trial may be compensated for by taking such cumulative scores.

Such an analysis is given in Table IV.

After com­

bining the distributions of total error scores of the four criterion groups, the percentage of each of the groups then falling in the best half (smallest number of errors), the first quarter (highest errors), and the fourth quarter (small­ est errors) of the combined distribution was determined and entered in Table IV.

The inclusion of the four groups made

use of the total range of scores thus far obtained on the instrument. When the sum of five trials is used for a performance index and the percentages of each group falling in the var­ ious ranges are examined, the differences discovered be­ tween groups are in keeping with the separation on the basis of the fifth trial alone.

It should be noted that on the

median separation of the combined distributions, the C^Ts

* the sum of the errors for five trials.

53 TABLE IV PERCENTAGE OP EACH GROUP IN SELECTED RANGES OP COMBINE! DISTRIBUTION OP TOTAL ERROR SCORES

NAO

WOP_______ CPT

AAS

Total Number:

62

42

14

17

Best Half

43$

50$

86$

35$

Q IV (best)

11$

30$

71$

12$

Q I

31$

21$

0$

35$

PERCENTAGE OP EACH GROUP IK SELECTED RANGE OF COMBINED TOTAL ERROR SCORE DISTRIBUTION

10

?0

50

Q—17

BAC

WCP CPT AAS

Fig. 3

GO

00

90