Construction of Knowledge Tests in Selected Professional Courses in Physical Education

378 73 4MB

English Pages 72

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Tests in Selected Physical Education Service Courses in a College

439 37 12MB Read more

Education and Hegemony : Social Construction of Knowledge in India in the Era of Globalisation [1 ed.] 9781443868303, 9781443859707

Globalization is a multidimensional concept that encompasses the politico-economic, socio-cultural and educational spher

173 74 956KB Read more

Values and Professional Knowledge in Teacher Education 2018045818, 2018055646, 9781351003346, 9781138544635

Values and Professional Knowledge in Teacher Education provides distinctive insights into potential strengths to develop

346 23 939KB Read more

A corrective guide in physical education

540 88 3MB Read more

Education and Knowledge in Thailand 9789749511855

199 73 42MB Read more

The Spectrum of Teaching Styles in Physical Education 9780367357184, 9780429341342

This is the first in-depth, practice-focused book to explain ‘spectrum theory’ and its application in physical education

441 65 3MB Read more

Defining Physical Education (Routledge Revivals) : The Social Construction of a School Subject in Postwar Britain [First Edition] 9781136451874, 9780415508094

First published in 1992, David Kirk's book analyses the public debate leading up to the 1987 General Election over

161 43 1MB Read more

The construction and evaluation of a boys’ physical education program for Crozier Intermediate School

444 98 3MB Read more

Physical Education and Sport

The manual contains the most important the theoretical information on the discipline «physical culture and sport», by th

989 62 19MB Read more

Practice Education in Social Work: Achieving Professional Standards 1911106120, 9781911106128

An invaluable guide for Practice Educators and Practice Supervisors undertaking learning and assessment to gain and main

199 99 2MB Read more

Construction of Knowledge Tests in Selected Professional Courses in Physical Education

Author / Uploaded
French
Esther Louise

Citation preview

THE CONSTRUCTION OF KNOWLEDGE mSTS IN SELECTED PROFESSIOK&L COURSES IN PI0T3ICAL EDUCATION

hy

Esther French

A dissertation submitted in partial Fulfillment of the requirements for the degree of Doctor of Philosophy, in the Department of Physical Education for Women In the Graduate College of the State University of Iowa Nay, 1942

ProQuest Number: 10984057

All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is d e p e n d e n t upon the quality of the copy subm itted. In the unlikely e v e n t that the a u thor did not send a c o m p le te m anuscript and there are missing pages, these will be noted. Also, if m aterial had to be rem oved, a n o te will ind ica te the deletion.

uest ProQuest 10984057 Published by ProQuest LLC(2018). C opyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346

F&>7 /

2

m m m m m s m The writer wishes to express her appreciation to Dr. M. Gladys Scott for her guidance and valuable assistance in this study. Sincere thanks are extended to those members of the staff of the Department of Physical Education for Women, State University of Iowa, who cooperated in this study, and to the students in the major department for their participation in the testing program. Of very substantial assistance was the constant encouragement of Dr* Elisabeth Halsey and the instruction received from Dr* 1* P* Lindquist in his course, wImprove

}

If OCT 1942

HERTZ

.

ment of the Written Examination.*

33)

Je'is:la,

,Ji

}-

ill

CONTENTS Introduction • . » « . . . • • • • * • » • •

....... •«

Statement of Problem •

P&g# 1 5

Procedure. « * . . . . • • * . .

• ............

5

Clarification of Objectives • * • • • • « . . . • • • •

5

Outlining the Courses » * * • » . . « « . .

6

Planning the Distribution of Items by Content Classification•.. » * • • • • • • • • • • • • • • • «

7

Preparation of Items. ..........

7

•

Critical Evaluation of Items. •

9

Selection of Subjects . . . . . . . .

...

Administration of Tests

12

12

Data and Interpretations • • • . « . . . . •

..........

14

Preparation . . . . . . . . . . . . . . . . . . . . . .

14

Selection of Items.

15

Beliabilities

19

Mean Scores and Standard Deviations for Pull length Tests •

21

Mean Scores for Short Tests • Norms

............ ......

Comparison of Tests • Suggc:..tions for Use. Conclusions. Bibliography . « • • • • • . Appendix •

21 25 25

........ ................... •••••*».••.*•••

28 50 51 55

1

BJ'»DUCm)N fli® philosophy underlying the modem tooting program is that of providing opportunity for the optimum development of the individual in relation to mutual service to a democratic society* The extensive use of tests in educational institutions has been duplicated in Industry by civil service examiners, state examiners for licensing pharmacists and doctors, bar examiners for lawyers, and examinations of various types given by personnel workers* Of fairly recent vintage are the aptitude tests for defense workers and the psychological tests for the various branches of the armed services* Since examinations have a vital effect on the lives and fortunes of many people, the responsibility of those persons engaged in constructing them is great* Knowledge tests in physical education are in a stage that might be termed 8i n f a n c y T h e y have been used in some institutions for grading and for promotion, but the results have not been entirely satisfactory.

Good objective teste require more time and effort than

the average teacher can give*

The results have varied from complete

avoidance and condemnation of objective tests, or the use of quickly formulated and inadequate tests, to a few carefully constructed ones rebuilt on the basis of earlier results and analysed for their efficiency* A recent trend in education is the use of tests in guidance* If the great American ideal, to give everyone M s utmost chance, is to be realised, tests which reveal fundamental factor® must be developed* \/ If 11desirable changes in the learner* is accepted as our purpose, then measuring instruments must be made available to enable us to evaluate the effects of the educative process*

t

The first published test in physical education m e a basket** ball test by Bliss (?) In 1929# and 48 on technique* liable or valid,

His test Included SO questions on rules

N© evidence is Resented of its being either re

The second test to be published was a soccer test by

Knighton (20) in 1980*

No report was made on the reliability or validity*

The test consisted of 85 questions, 26 of which were true-false*

This

was a teat on rules, and appears best suited to players with limited experience*

lodgers and Heath (56) published a baseball knowledge

test In 1981 for fifth and sixth grade bays*

This test consisted of

100 true-false items on rules and strategy*

The reliability is hlghj

the validity would depend on the use# teaching device#

It appears to have value as a

The same authors published a soccer test for fifth

and sixth grade boys in 1952*

The same comments as those regarding

their baseball test apply here# Two tests for use in training basketball officials (girls1 rules) have been developed, one by Schleman (40) In 1952 and one by Scott (42) In 1987*

They are primarily teaching devices*

Hemphill (17) recognised the need for information teats and published in 1982 a detailed study consisting of several tests for high school boya#

The reliabilities ranged from .66 to *87*

A golf knowledge teat was published by Murphy (SO) in 19SS. She selected the test items from statements made in golf text books, considering this a cheek on validity# Tests for officials in field hockey for women were constructed by Grisier (14) in 1954 and French (52) in 1339* Both tests were vali dated by comparing the performances on the test of rated officials with non-rated officials and with players*

5

Four knowledge teats in tennis have been publishedt

Wagner

(55) in 1955, Snell (49) in 1956, Hewitt (18) in 1957, and Scott (44) in 1941 • Wagner18 report included but ten questions and no attempt was made to validate the test* Snell*e teat, Including 45 questions, has a oomparatively low reliability and is heavily weighted with items on rules*

Hewitt included both m n and women in his study*

He had 100

questions, fifty-five of which were true-false, 15 multiple choice, 15 completion, and 15 matching forme*

He compared the scores made by

his male subjects with the scores made on the Snell examination and found a high correspondence between the two tests* was for beginners and intermediates*

Scott's examination

The items to be retained were

selected on the basis of difficulty rating and the Index of discrimina tion*

the Elementary Test is composed of £5 multiple choice items and

41 time-false| the Intermediate Test has £1 multiple choice and 50 true-false items*

The tests appear satisfactory for measuring the

tennis knowledge of general college classes• The only rhythm knowledge tests published to date are those by Thompson (SI) in 1955 and by Shambaugh (46) in 19SS*

Thompson has

made no statistical study to indicate the worth of the testj the questions are of value as illustrative material.

Shamb&ugh has a reliable test

in folk dancing for general college students* In 1957, Schwartz (41) reported a study of a basketball test for high school girls*

She was assisted by fifty-four woman, ttwell

versed in the theory and technique of basketball." on the reliability*

No report was made

The test appears too easy for an achievement test.

The two most extensive testing projects have been those reported by Snell (47, 48, 49) in 19S5 and 1936 and by Scott (45, 44, 45) in 1940

^

4

and 1941*

the former project m g developed by the Department of Physical

Education for Women, University of Minnesota, and includes testa in the following activities!

archery, baseball, basketball, field hookey,

fundamentals, golf, riding, soccer, tennis, and volley ball. Each test has 45 questions of the multiple choice type, with reliabilities ranging from .51 to .85*

Uiey were constructed for general college students,

were validated on ^expert opinion,H and contain considerable non-* functioning material,

the tests reported by Scott are those developed

by the Besearch Committee of the Central Association of Physical Education for College Women, and include tests for the following activities! badminton, swimming, and tennis• The individual items were baaed on materials commonly taught in college classes in the Central District, as revealed by questionnaires • The items to be retained were based on an index of discrimination and a difficulty rating.

The teats are in

tended for general college students. Tests for official® in basketball, softball, tennis, and volley ball have been prepared by various persons for the Women*® National Officials* Bating Committee,

They are revised frequently, usually

without validation, and are composed of questions that concern primarily the official!

rules and officiating techniques*

None of the specially organized test building agencies, such as the Cooperative Test Service and the various state and regional testing services, has prepared tests in physical education# This is a complete survey of published tests in physical education to April 1, 1942. An analysis of these indicates that there is nothing that meets our purpose.

5

statement of

m

problem

It is the purpose of this study to construct knowledge tests to be used as a measure of achievement in sixteen of the technique courses re quired for women students H&joring in physical education at the State University of Iowa.

The following activities are included*

badminton,

basketball, body mechanics, canoeing, field hockey, folk dancing, golf, recreational sports (aerial darts, bowling, deck tennis, handball, ahuffleboard, table tennis, tetherball), rhythms, soccer, softball, stunts and tumbling, swimming, tennis, track and field, and volley ball. The technique courses in general are of two types* liminary, and teaching.

basic or pre

These tests will be used as a partial basis for

determining the needs of entering students. ^ y PROCEDURE The procedure followed in attempting to solve this problem con sisted of the following steps* (1) (2) (5) (4) (5) (6 ) (7)

Clarification of objectives Outlining of courses Planning the distribution of items by content classification Preparation of items Critical evaluation of items Selecting the subjects Administering the tests

Clarification of Objectives For this particular situation, it was decided that the ultimate objective was to build items suitable for a diagnostic test (battery of achievement tests).

The problem is measuring the knowledge that the student

possesses and how well she understands what she has learned.

The tests

i/' should measure the student’s ability to use her knowledge, to generalize, to make application®; therefore, they should not be built as learning-

6

teaching device*• Ifre item* should discriminate between students of all levels of ability but particularly between those students above the lowest quariile of the ability distribution* It was decided to experiment with the use of diagrams wherever they appeared advantageous in furnishing a compact device for getting at complex ideas or spatial relations*

The questions on

rules were to be limited to those rules essential to intelligent play* It was decided to build tests for the experimental battery of such length that the majority of students could complete each one in fifty minutes; also, to make them b© "power tests," with gradation of items according to predicted difficulty* Each test was to be weighted with items In proportion to the emphasis given in the technique course itself*

less emphasis will be

placed in the test on analysetion of skills than is true in the course, since individual skills will be measured by skill tests*

"When to use

what" will b© tested in the knowledge test, as will team tactics and strategy, in the courses in which they are a part* Outlining of Courses Following a meeting of the staff of the Department of Physical Education for Women, at which time the objectives of the study were dis cussed and the plans outlined, each instructor was given a form for outlining her course(s) and a sample outline*

She was also asked to

J

list the reading assignments, table of contents for required notebooks, and to include copies of mimeographed materials given out in class and any examination questions which she had found through experience to be good*

7

Planning the Distribution of Items by Content Classification The content of each course was reviewed for skills, generaliza tions, ideational content, and specific information*

Questions were

raised such as “Why is this worth learning?" and “To what objectives does this contribute?11 The usual procedure included submitting a tentative distribution to the instructor of the course and then discussing it with her and revising it until the optimum distribution appeared to have been reached• Hie distribution which is included here is typical of the team sports included in the batteryDistribution of items by content classification in Soccer Knowledge Test: Number of Questions Analysis of individual techniques (How to do in good form)— - -- — Analysis of Game Situations and Use of Skills* General Knowledge (History, Selection and Care of Equipment, Safety Precautions, Differences between Field Ball, Speedball, Soccer)— How to Avoid Fouling* Placement of Passes, Throw-Ins, Kicks for Iketics and Areas of Play— — — — Rules Essential to Intelligent Play— — -Terminology— -*— -— — --- — —

%

5

5

8

15

5 3

5 5

Z ZZ 15 4

4 56 25 6

60 Preparation of Items It was decided to build multiple choice items for the following reasons: (1)

They can be adapted to test for any depth of understanding* ^

"^The distributions of items by content classification of the other tests will be found in the appendix*

3

(£)

They can be made completely objective in scoring and

are easily adapted to answer sheets •

(5 )

It is possible to detect readily any non-functional

material in the responses, thus facilitating revision of the question for the final battery by elimination of responses* (4)

They teat for the student*s ability to eliminate in

correct r m p a m m as well as to select the correct response directly* (5 ) They do not require correction for guessing* (6 ) They seem best adapted to the type of material involved* (?)

They seem to have more advantages and fewer disadvantages

than the other commonly used forms*

true-false, multiple responses,

matching forms, and completion* In preparing the items for each test, the course outline was studied, the notebook table of contents and the distribution of items were consulted, the reading assignments and the mimeographed materials were read, along with standard reference books and articles, until familiarity with the content of the course and the objectives of the course was obtained*

(Six of these courses are taught by the writer,

and she has had experience in all of the activities included as a participant and has taught the majority of them*)

The suggestions con

cerning tin© construction of multiple choice items from Hawkes, Lindquist, and Mann (15), pages 159-147, were observed. Following the preparation of tentative items, the usual pro cedure included examination of the items by the thesis director and the instructor of the course and one or more other teachers familiar with the activity or familiar with testing.

Suggestions for revisions and

for additional items were obtained in this manner*

9

Critical Evaluation of Xtema After the items were prepared, toy were reviewed one by one o in considerable detail, making use of the following check list* (1 ) Exactly what is this item intended to measure? (2)

Is the Intended purpose of the item acceptable?

Is it

important that the item be included5 does it test for something significant? (3)

Is there any ambiguity in the item? Will the student

recognise the purpose of the item? Can it be made more clear? Are there any qualifying phrases that might start the student to thinking along an irrelevant line? (4 ) Does the item contain any unintentional clues to the correct response? (5) Will authorities agree on the correct response? Are the intended wrong responses less acceptable than the correct one? (6 ) Are any of the wrong responses likely to appear more plausible than the correct response to the best of the students to be tested? Is the item too difficult for the best students in the group? (7) Is the item phrased as economically as possible?

I® it

straight forward, direct? (8 ) Is the form of the item as well adapted as any to its intended purpose? Would a diagram help? (9 ) Would the rot© learner have any undue advantage in reponding to the item? lias "text book" language been avoided? In setting up the tests, the following technical problems received considerations

^Check list adapted from class notes t Improvement of Written Examination, Dr* E. F* Lindquist* (Not quoted verbatim).

10

(10)

Is the provision for the student's response as economical

of his time as possible? (11) Are the directions to the student as simple and understandable as they can toe? (12) Poes the provision for the student* s response provide for convenient, accurate, and economical scoring? Other technical problems that were considered for each test individually were as follows t (1$)

Gan the typographical arrangement of the items be

(14)

la the spread of estimated difficulty of items adapted

improved?

to the estimated spread of ability of the group to be tested? (15) Are the time limits adequate? (16) Are the questions placed in the test in an order progressing from easy to difficult, as estimated? hill the slow student be prevented from spending an undue amount of time on items that are too difficult for him? Bach test was checked in view of validity's

^

(17) Are any important objectives or outcomes of instruction seriously neglected in the test as a whole? (18)

Is the emphasis on functional value and not on content

objectives? (19)

Is there any undue testing of isolated detail or un

important items of information, such as terminology or definitions, for their own sake? (20) Are test situations suggestive of the life situations in which the student may make actual use of what he has learned?

u

(SI) Would this test be less satisfactory If used as an 11open book# test?

Are there a sufficient nuniber of questions which

require drawing of inferences and the making of applications? Obviously, these are rigid criteria, and no claim is mad© that they have been satisfied in ©very respect*

Some have been sacrificed

for the sake of others, for example, number 10 * the answer sheet requires more of the sindent*s time; it baa been estimated that students can answer about ten per Gent fewer items when using an answer sheet than when whiting directly on the test forms*

The students soon became accustomed

to using the answer sheets in successive testing sessions* was sacrificed somewhat, then, for number 1£*

Humber 10

Convenient, accurate,

and economical scoring was essential in this study*

Humber 5, ttWill

authorities agree on the correct responseis another item about which there can be some doubt-

It has been impossible in education to secure

complete agreement on any complex mental ability* liven in a field as objective a© english correctness, you cannot always secure complete agreement as to when It is proper to use a comma and when improper * The questions on rules and other such factual material offer less opportunity for disagreement, which probably partially accounts for ^ ' ' the tendency noted among the test constructors of knowledge tests to over-weigh these items*

If we took criteria 6 too seriously and tried

to secure 100 per cent agreement from a large number of authorities, we would probably do it at the expense of discrimination*

One has

only to consult the text books in physical education to note the wide variance in opinion of authorities, particularly on performance of skills and on strategy*

This variance of opinion occurs with greater

frequency in tennis and in golf, fields in which the professional

xz

players tend to dominate the literature.

Their mechanical analysis

is sometimes faulty; Selecting the Subjects All women students majoring in physical education were asked

^

to list the activities in which they had received instruction in high school, in college elsewhere, or in the State University of Iowa.

The

records of the activity courses that they had taken here were also checked.

The subjects included freshmen, sophomores, juniors, seniors,

and graduate®.The numbers writing 74; basketball,

each test were as follows * badminton,

93jbody mechanics,54| canoeing, 63; field hockey, 82;

folk dancing, 75; golf, 65; recreational sports, 87; rhythms, 80; soccer, 48; softball, 69; stunts and tumbling, 69; swimming, 80; tennis, 84; track and field, 52; and volley ball, 86 # The tests were given before

u-

the season that the activity was taught and again at the end to the group that had just completed the course. Administration of Tests The tests were administered to all, except to a few absentees, during regularly scheduled class periods.

Each girl was given a page

cf directions and an answer sheet as she entered the room*

A check

was made to see that all were supplied with pencils, that no books were brought into the room. After the blanks at the top of the answer sheet had been filled, the test forms were distributed, face down, on© copy to each girl*

The signal to start writing was given and the time was

recorded# S Copies of the directions and an answer sheetappear in the appendix. The answer sheets were “blue lined** for each test, indicating to the student the number of items in each question.

15

The tests were proctared by one or more members of th© teaching staff, and the chairs were six or more feet apart) Hies© factors, combined with the us© of an answer sheet, placed the danger of copying at a minimum.

Ho help was given in interpreting the questions. Records were kept on th© order of finishing for the first four

girls and the time of finishing, as well as th© number and names of those still writing when time was called. Care was taken to see that each girl received but one copy of the test and that all copies were collected at the end of th© period.

14

DATA AND XHTERFEETATI0N3

Preparation Th® answer sheets for each test were scored by superimposing on them anespecially answershould

prepared key, with holespunched

where the correct

appear# A red pencil mark wasplacedwherever

blank

spaces appeared through the key, and the total of questions answered correctly was recorded on the answer sheet*

Th© answer sheets were

then sorted according to scores and th© data were transferred to

^

tabulation sheets*^ All omissions were treated as errors* For example, the record of a girl with a criterion score of 50 would be recorded in the step Interval 30-31, illustrated below in the portion of the tabulation sheet#

Xf she selected the correct response for question

17, a tally mark was placed in the column headed *R», or rights* Rr

?

17

W,*

0

'50-sir'.

",rl~ 28-29___________

£6-27 4 _______ 24—25 ££-£5 0_______ (3) The record for a girl with a criterion score of 27 would be entered opposite th© step interval 26-27*

If 5 is the correct response and

sh© selected 4, her choice was recorded in th© column headed “W, 011 or ‘ wrongs, omits* Not© that the correct response is shown in brackets at th© bottom of the paper*

The record of a person with a criterion

copy of the tabulation sheet is included in the appendix* sheet accommodates five questions*

Each

15

score of ZZ who omitted the question is also shown in the illustration* the method is described in detail because of its simplicity and use fulness for teachers who wish to check on the effectiveness of their examinations. The number who succeeded on the question was obtained by adding the tally marks in the column headed URU; the number failing by counting each score appearing in th© MW, 0 “ column as one* An item analysis m s made to determine the number selecting each response for each question*

This was taken directly from th©

tabulation sheet* Selection of Items It was arbitrarily decided to drop any question from further analysis if less than three responses “functioned" in the test*

Func—

^

tioning of an item or response at this stag© was defined as being selected by three per cent or more of the total number of persons taking the test ($)•

If the number of cases was 84 or above, an item had

to be selected by at least three persons while if N was 50-85, the minimum was two.

The number of questions discarded in this manner

varied fro® one to eleven per test*

S

The difficulty or percentage of incorrect responses was obtained by dividing the number of errors by R*

Thus, a question

with a difficulty rating of 40 is less difficult than one with a difficulty rating of 45; the higher the difficulty rating, the more ^ difficult the question.

It was arbitrarily decided that few questions

would be retained with difficulty ratings below 10 or above 90 per cent, ®Th© questions with fewer than three items functioning at the three per cent level are identified in the tables appearing in th© appendix*

16

and that preference would be given to those questions with difficulty ratings between 40 and 60 per cent* 11The worth of a test item depends not only upon its desirability

for inclusion in the curriculum and upon its difficulty, but also on its power to discriminate between pupils of high and low levels of general achievement in the field involved*” (15) A question is said to be perfect in diacriminating power when every student who answers it successfully ranks higher in th© scale than any student who answers it incorrectly* A question may be said to have zero discriminating power when there is no systematic difference between th© ability of the students who succeed on it and those who fail. A question in which more students of low ability succeed than students of high ability is said to have minus discriminating power*

Between the extremes of

perfect and minus discriminating powers, questions of all degrees of discrimination are found* In large scale test construction, time does not permit th© construction of a curve for each question on the basis of preliminary experimentation*

wIf discriminations ar© to be made between the items

of various degrees of goodness, they must be made on the basis of a single quantitative measure, or index of discrimination, that can be conveniently computed and readily tabulated and compared for a large number of items.” (25) Various indices of discrimination have been studied and compared by research workers but no one index is an Infallible measure of the validity of an item for all situations, for all groups of students, and for all levels of ability*

MTh© problem of securing a quantitative

measure of the validity of a single item has not been satisfactorily solved.” (25)

17

The lnd«x selected for this study is the difference between the mean criterion scores of those persons succeeding on the item and of those persons failing on th© item, or as expressed in formula form, Mg ~ 1^ 4 This index was recommended by owineford (5) for a heterogeneous group of subjects and was selected in preference to the other indices for the following reasonst 1*

It is easy to calculate

&•

It is based on all th© data secured

5*

It was apparently satisfactory when used in the Central

Research Committee teats in physical education activities. (45, 44, 45) According to Long and Sandiford (26) KThe better techniques differ so little in effectiveness that ease of computation may usually be accepted as a legitimate consideration in determining which techniques to adopt** The index was determined for each question in the experimental battery, with th© exception'of those few questions dropped from further analysis because of having less than three parts wfunctioning*w The index of discrimination and the difficulty ratingwere used as guide in deciding which questions to retain# Thepurpose of I- U the test was to determine which entering students possess knowledge equivalent to the knowledge of the average student who received instruction in the particular activity at the State University of Iowa* For that reason, questions of around 40 to 60 per cent difficulty were favored, providing that they had discriminating value#

No questions

were retained that had minus discriminating power; very few of these

6 appeared in the tests* The sis© of the Index m s arbitrarily set at ■

*

See tables in appendix.

18

on# probable error, but for content reasons, some deviations were made in selecting questions*

In two of the tests, one half of a standard

deviation was used as the sis® of th® index. An inspection was mad© of the tabulation of each question to see If the wrong responses were chosen more often by low ability persons than by those In th© upper half of th# scale.

Questions which deviated markedly from this standard

were rejected regardless of their index of discrimination or difficulty rating. The criterion ©core for each student on each test was the total of right responses*

It would have been preferable to validate

th© tests on the basis of tests known to be valid, constructed for the same purpose, and covering the same content but no such tests were available* ttThe use of the total score on a test as a basis for thus evaluating its constituent items is therefore defensible to the degree that the original score provides a valid measure of the ability to be tested.....If all of the original items satisfy the subjective requirements for validity, If their content belongs in the subject matter being tested and if they have been well selected and dis tributed with reference to the objectives of instruction, then the index of discrimination will prove extremely valuable for identifying or drawing attention to those items which are functionally weak because of structural iraperiections, ambiguities, inclusion of irrelevant clues,.. ...,f(2 S) It is recognized that when the total score on the test itself is used as a criterion that the index of discrimination becomes more a measure of reliability than of validity.

However, an index so

obtained enables one to select those questions within the test that are individually most effective in measuring whatever th© test as a whole happens to be measuring.

19

Another factor in selecting the questions to be retained was th© time available for testing*

Since for the purposes of guidance it

is advantageous to do the testing in th© short span of time before actual class work starts and since th© testing program includes other tests, such as motor ability and various skill tests, the time is quite limited*

The time records taken on the experimental battery

served as a rough guide in estimating th© amount of time needed for the majority to complete th© tests*

It was decided to attempt to

limit th© test® to such length that three could be administered each fifty minute period, which gave an approximate range in material of twenty to twenty-six questions*

Th© entering student rarely has had

instruction in all of the sixteen activities! three or four periods will probably be sufficient*

The optimum amount of material to ad

minister in a given time limit is obviously that which will yield the maximum validity in that time limitj the data collected in this study are not sufficient for determining that.

It is thought that more

material than th© optimum would result in a lower validity because of the emphasis on the speed factor, while less than the optimum would result in.a lower validity due to the more limited sampling#

The

determination of the optimum time is a suggestion for further study# The distribution of content in the short tests approximates th© proportions of content in the original tests# Reliabilities Th© reliabilities were computed by the odd-even method and corrected to actual length by the Spearman-Brown prophecy fonnula. The reliability coefficients for the original full length tests and

^

20

Table 1 Reliability Coefficients for Full Length Forms and for Short Length Forme of the Test $eat

Full Length Form r ___a*, m

Short Form r n q

4.

Canoeing

58 54 56 60

5. 6. 7* 8.

Field Hookey Folk Dancing Coif Recreational Sports

84 56 60 m

82 75 65 87

.865 .755 .857 .802

25 25 26 20

71 58 47 55

.878 •853 .794 .710

80 48 69 69

.871 *825 .810 .752

26 26 26 26

50 53 bo 49

.780 .824 *634 •819

80 ■04 52 86

.816 *824 • m .@84

22 26 26 26

55 59 259 54

•843 .326 .750 .752

X.

Badminton

2 * Basketball 8 . Body Mechanics

9. Rhythms 11. 12.

Softball Stunts and 'Tumbling

55 60 55 55

IS. 14. 15. 16.

Swimming Tennis Track and Field % H © y Ball

65 60 60 57

10 * Soccer

74 m 64 08

•856 •854 .702 .750

26 24 20 26

46 55 40 49

.761 .700 .724 .700

q refers to total number of questions in the test n refers to total number of subjects in th© test ^

Zl

r\

.hi

/p o

for th® shorter tests are given in Table I.

Table I also shows the

number of questions in each form of th© teat and the number of sub jects on which the reliability was baaed* The reliabilities of th© long forma ranged from *702 to .884 which indicates that th© criteria was adequate*

In most of the

tests, th© reliability for the shorter form was lower than for th© longer farmj this is to be expected due to fewer questions, the smaller number of cases, and a narrower range in ability*

The re

liabilities for th© long forms were computed on th© total number of cases; for the short forms, they were computed only on those subjects \

who had had their previous instruction at the State University of Iowa* Mean Scores and Standard Deviations for Full Length Teats Means were computed for all the tests in their full length form for two groupss

Group A, including all subjects who had had

instruction in the activity previous to the first testing, and Group B, including the students completing the course at the State University of Iowa during the year 1941-42•

Standard deviations were

computed for the combined groups for the full length teats, to give an indication of the rang© of scores*

These cats are presented in

Table II, which also gives the total number of questions in each test* Mean Scores for Short Tests After the questions were selected, all the papers of students who had had their instruction in the activity at the State University of Iowa were rescored, and th© papers were separated into-two grouper

22

Table XI Mean Scores and Standard Deviations for Full Length Tests* Group A, All Who Had Instruction Previous to First Testing and Group B, the Students Completing the Course During the Year 1941-1942

Test

Humber of Questions

S.D.

Group A

Group B

Mean n

Mean n

1. 2* St 4*

Badminton Basketball Body Mechanics Canoeing

58 54 55 60

3.16 3.66 4.70 7*14

27,1 51,5 19.8 52.0

66 76 45 59

51.8 57.7 19.1 55.4

18 17 11 24

5* 6* 7. 8.

Field Hockey 64 Folk Dancing 56 Golf 60 Beereational Sports 65

8.10 5.86 3,08 8.28

54.0 17*5 24.5 28.4

48 57 48 62

39.6 19.7 31*3 55*9

34 18 15 25

9, 10* 11* 12.

Rhythms Soccer Softball Stunts and Tumbling

55 60 55 55

5.92 8.60 6.36 4.66

17.0 51.2 25.7 18.9

57 29 56 48

20.0 57.8 29.7 22.5

23 19 15 21

15. 14* 15* 16.

Swimming Tennis Track and Field Volley Ball

65 60 60 57

7.70 7.96 6.86 8.64

46.5 65 55.5 57 29.4 51 28.6 71

47.5 40.8 55.7 57.4

17 27 21 15

25

A, Instruction at some previous time, and B, those Just completing the course*

Mean scores were computed for each group as well as for the

total number of subjects*

These data are presented in Table III, along

with the number of subjects and the number of questions* In both the long and short forms, with a very few exceptions, the “B*1 group, composed of students completing the courses during the year 1941-1942, was superior to the HAn group, composed of those students receiving instruction prior to 1941-1942*

This is an indication that

the questions are based upon materials taught in the courses*

That

the differences are na greater appears to be an indication that the students have retained a considerable portion of the knowledge gained earlier or that the questions are concerned with the information that was important enough to be retained* Norms Norms were established for the short tests on the basis of performance of students who had their instruction at the State University of Iowa.

This group included students who had one or more years inter

vening between instruction and testing as well as those who had just completed the courses, a situation parallel to that of the entering students*

The norms are as follows*

Badminton Basketball Body Mechanics Canoeing Field Hockey Folk Dancing Golf Recreational Sports

17 16 11 IS 15 12 10 11

Rhythms Soccer Softball Stunts and Tumbling Swimming Tennis Track and Field Volley Bali

16 14 13 15 16 17 14 17

The number of cases upon which they are based is given in 'Table III*

24

Table H I Mean Scores on Short Tests for All Students Receiving Instruction at State University of Iowa j Group A, All V/ho Had Instruction pre vious to first Testing and Group B, the Students Completing

the

Course During the Tear, 1941-1942

Test

Humber of Questions

Group A

Group B

‘ Jbt&l

Mean n

Mean n

Mean n 46 55 40 49

1* 2« 3* 4.

Badminton Basketball Body Mechanics Canoeing

26 24 20 26

16.2 29 15.9 39 11.5 29 14.2 25

IB.7 17.7 10*4 15.5

17 16 11 24

17.1 16.4 11.0 14.9

5. 6. 7. 8.

Field Hockey Folk Dancing Golf Recreational Sports

25 25 26 20

12.9 12.0 7.5 9.8

37 40 52 31

16.2 12.8 14.5 13.2

54 18 15 24

14.5 71 12.5 58 9.7 47 11.3 55

9* 10. 11. 12.

Rhythms Soccer Softball Stunts and Tumbling

26 26 26 26

15.5 13.1 12.1 14.4

27 19 21 28

16.7 15.6 13.6 16.6

23 19 12 21

16.0 14.3 12.7 15.4

13. 14. 15. 16.

Swimming Tennis Track and Field Volley Ball

22 26 26 26

14.3 15.6 10.7 16*3

38 52 18 18

15.0 18.3 15.6 18.4

17 27 21 16

14.5 55 16.9 59 15.7 39 17.3 34

50 38 35 49

25

Comparison of Teats Considering the tests In their original fall length form, all had reliabilities above .700; eleven of the sixteen had reliabil ities above *300, with five of these eleven exceeding .850•

The

tests falling below *800 in reliability are as follows: Body Mechanics, .702$ Folk Dance, .735; Canoeing, .750; Stunts and Tumbling, ,752; and Track and Field, *780.

The tests exceeding .850 were

Basketball, @54; Badminton, .856; Field Hockey, .865; Rhythms, ,871; and Volley Ball, *884, Since the length of tests is known to affect reliability, the number of questions for each test ie given below, in alphabetical orderi Badminton Basketball Body Mechanics Canoeing Field Hockey Folk Dancing Golf Recreational Sports

58 54 35 ©0 64 56 60 @3

Rhythms Soccer Softball Stunts and Tumbling Swimming Tennis Track and Field Volley Ball

35 60 54 55 65 60 60 57

The average number of questions was S3, Of the five tests falling below .800 in reliability, three had well below the average number of questions; the average for the five tests was 45 questions. Of the five testa with reliabilities above .850, four exceeded the average number of questions; the average for the five tests was 53 questions* The reliabilities of the short form tests are of more con cern in this study*

Ten of the tests have lower reliabilities in

their shortened forms than they had in the full length form; six of the tests have higher reliabilities.

The reliabilities vary from

26

#619 to *878, with two falling below *700 and five exceeding *800* The two tests with reliabilities tinder *700 are Stunts and Tumbling, *619 and Softball, *634*

Both of these tests had

higher reliabilities in their complete forms so th© addition of a few questions may be advisable• Other tests in which the reliability was lowered consider ably in

the short form were Basketball, from .854 to *700$ Badminton,

from *856 to *761, Recreational Sports, from *802 to *710| Rhythms, *871 to

.780$ and Volley Ball, .884 to,752. The five tests having the highest Inner consistence in

the short form are Soccer, .324) Tennis, .826) Swimming, .845) Folk Dance, .855) and Field Hockey, .878. The number of questions?discarded because of having too little spread of choices is a rough indication of th© amount of non functioning material*

No question was retained if it had less than

three parts each of which was selected by three percent or more of the persons taking the test.

The total number of questions dis

carded for this reason was 85, approximately 10 percent*

By tests,

in rank order t Body Mechanics Swimming Tennis Canoeing Track and Field Stunts and Tumbling Softball Volley Ball

17.1 16*9 15*0 15,5 11.7 11.4 11,1 10.5

per cent n tt tt tt it n «