Logical Aspects of Educational Measurement 9780231885805

Clarifies the meaning of measurement in education from the standpoint of general ideas that underlie the construction of

157 97 9MB

English Pages 186 [196] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Logical Aspects of Educational Measurement
 9780231885805

Table of contents :
Preface
Contents
I. The Meaning of Measurement
II. Intellectual Antecedents of Educational Measurement
III. Early Development and Classification of Instruments
IV. The Logical Foundations of Measurement
V. Logical Aspects of Validity
VI. Performance and Validity
VII. Addition and Performance
VIII. Addition and Quality Scales
IX. The Outlook of Educational Measurement
Bibliography
Index
Vita

Citation preview

Logical Aspects of Educational Measurement

L o g i c a l Aspects o f Educational M e a s u r e m e n t By B. O T H A N E L

SMITH

S U B M I T T E D I N P A R T I A L F U L F I L L M E N T OF THE REQUIREMENTS

FOR T H E DEGREE OF

DOCTOR OF P H I L O S O P H Y , I N T H E OF

PHILOSOPHY,

New York

COLUMBIA

FACULTY

UNIVERSITY

: Morningside

Heights

COLUMBIA UNIVERSITY ι 93 Β

PRESS

COPYRIGHT COLUMBIA

UNIVERSITY

1938 PRESS,

N E W

YORK

Foreign agents: OXFORD UNIVERSITY PRESS, Humphrey Milford, Amen House, London, E.C. 4, England AND B. I, Building, Nicol Road, Bombay, India; K W A N G H S U E H P U B L I S H I N G HOUSE, 140 Peking Road, Shanghai, China; M A R U Z E N COMPANY, LTD., 6 Nihonbashi, Tori-Nichotne, Tokyo, Japan

Manufactured in the United States 0} America

Preface

T

HIS book has grown out of efforts to clarify the meaning of measurement for myself. As far as my knowledge goes, discussions of the fundamentals of measurement in education are rare and the few that exist are meager. Books on educational measurement describe the experimental and statistical procedures of constructing tests and scales, but they seldom discuss the fundamental ideas that lie behind these procedures. Even when they do discuss them, they do so only in a brief and superficial manner, usually confining the treatment to the meaning of a series. Since my purpose was to clarify the meaning of measurement, my procedure was to explore the general ideas upon which measurement is based, by an examination of measurement in the fields in which it has been the chief instrument of research. By such an examination I attempted to reach the deep-lying ideas upon which measurement in any field rests. These ideas are discussed in Chapter IV. In one sense, therefore, this is not a book on educational measurement, but a book on what lies behind measurement. As I began to discover the fundamental ideas underlying measurement, two facts became evident. The first was that behind the operation of these ideas were certain preconceptions of the nature of mind and learning, which prepared the way for an experimental interpretation of them. Chapters II and I I I contain a brief discussion of these preconceptions, together with a description of the early instruments of educational measurement and the types which are to be considered in this essay. The second fact was that these fundamental ideas afforded a good standpoint from which to examine educational mea-

vi

Preface

sûrement with respect to its preciseness. This I have attempted to do in Chapters V to V I I I inclusive. Whatever criticism of educational measurement I have ventured to make, therefore, has been recorded for the sake of logical rigor and clarity, and not from a desire to depreciate measurement in education. I make this observation because we have grown into the habit of assuming that when an individual criticizes a part or aspect of anything, he thereby condemns the whole of it. Educational measurement has been considered from the standpoint of precision such as that required in exact experimental work. Little, if any, reference is made to it as an instrument used in instruction. Educational experimentation can be no more accurate than the instruments of measurement upon which it rests. I therefore hope that a consideration of educational measurement from the standpoint of its fundamental ideas will help to throw light on some of the problems of educational science. Whether it does or not, the reader will judge for himself. Neither time nor ability permitted me to deal with all the kinds of instruments of measurement that have been developed in education. I have omitted diagnostic instruments altogether and have barely touched upon the subject of intelligence tests and scales. Since I had to limit my observations in some way, I chose to omit diagnostic tests because they emphasize analytical rather than quantitative descriptions of behavior. Intelligence tests and scales were omitted largely because they are not solely educational, belonging partly to the field of psychology. I have therefore confined my work to what might be called developmental tests and scales and quality scales. The fundamental principles of measurement, however, are applicable to all instruments of measurement and the interested reader may make his own applications. As is often true of an author, I have incurred more obligations in writing this book than I can fully acknowledge. The sources of many of the ideas of which I have availed myself

Preface

vii

have been lost in the texture of my thought so that I cannot now retrieve them. To those individuals to whom I am obligated, but cannot at the present moment recall their individual contributions, I am indeed grateful. I must, however, make particular mention of my obligations to the works of Campbell, especially his Measurement and Calculation. My indebtedness to him extends far beyond the meager citations which I have made in the text. Expression must be given of my gratitude to Professors Harold Rugg, Bruce Raup, and Herbert Bruner for their criticism, encouragement, and counsel. Much criticism and help was also gained from Professor Ernest Nagel, of Columbia University, both through his writings and conferences. Finally, I wish to acknowledge my obligations to Professors Helen Walker, William A. McCall, George Hartmann, Warren G. Findley, Arthur I. Gates, and William H. Kilpatrick, the first four of whom read the manuscript in part or in whole. From each of them I gained many helpful suggestions, especially at points where my mind had slurred over significant factors. Whatever be the shortcomings of this essay, they are, of course, my own responsibility. If my capacities had permitted me to take advantage of all the suggestions which I received, the book would have been much improved. B. URBANA, ILLINOIS

July, 1938

OTHANEL

SMITH

Contents I. The Meaning of Measurement . MEASUREMENT

AND

UREMENT

JUDGMENTS;

AND

AND

INSTRUMENTS;

.

.

ENUMERATION;

ι

MEAS-

MEASUREMENT

GENERAL

T I C S OF I N S T R U M E N T S ;

.

CHARACTERIS-

EDUCATIONAL

MEAS-

UREMENT

II. Intellectual Antecedents Measurement

of

Educational 14

T H E E X P A N S I O N OF P H Y S I C S ; " T H E D Y N A M I C S OF O R G A N I C F O R M S " ; T H E A P P L I C A T I O N EXTENSION THE

RISE

OF

MATHEMATICAL

OF P R E C I S I O N

AND

STATISTICS;

PSYCHOLOGY;

SUM-

MARY

III. Early Development and Classification of Instruments T H E B E G I N N I N G S OF E D U C A T I O N A L MENT;

THE

FIRST I N S T R U M E N T S

40

MEASUREOF

EDUCA-

T I O N A L M E A S U R E M E N T ; G E N E R A L C L A S S E S OF I N S T R U M E N T S ; C R I T I C I S M S OF I N S T R U M E N T S ; SUMMARY

I V . The Logical Foundations of Measurement M E A S U R E M E N T A S A S E A R C H FOR S T R U C T U R E ; THE

AXIOMATIC

CONDITIONS

OF

MEASURE-

M E N T ; I N T E R P R E T A T I O N S OF T H E C O N D I T I O N S OF M E A S U R E M E N T ; T H E T Y P E S OF MENT; SUMMARY

MEASURE-

57

χ

Contents V . Logical Aspects of Validity

81

PROCEDURE OF E S T A B L I S H I N G V A L I D I T Y ;

THE

POLAR

AND

N A T U R E OF V A L I D I T Y ; V A L I D I T Y

I T S FIRST C O N D I T I O N ; V A L I D I T Y A N D ITS SECOND C O N D I T I O N ; S U M M A R Y

V I . Performance and Validity

96

T H E O U T E R A N D I N N E R A S P E C T S OF B E H A V I O R ; T H E S E L F A S A S T R U C T U R E OF A T T I T U D E S ; T H E S E L F A N D O U T C O M E S OF L E A R N I N G ;

IMPLICA-

T I O N S FOR V A L I D I T Y OF I N S T R U M E N T S ;

SUM-

MARY

V I I . Addition and Performance THE

PATTERN

OF

125

ACHIEVEMENT

INSTRU-

M E N T S ; T H E Q U E S T FOR E Q U A L U N I T S ; A CRITIQUE

OF

THE

PROCEDURES

OF

OBTAINING

U N I T S ; T H E T Y P E S OF U N I T S ; S U M M A R Y

V I I I . Addition and Quality Scales

144

T H E T H E O R Y OF Q U A L I T Y S C A L E S ; A N A N A L Y SIS OF J U D G M E N T S OF M E R I T ; U N E Q U A L U N I T S ON T H E N O R M A L C U R V E ; S U M M A R Y

IX. The Outlook of Educational Measurement 159 TOWARD

SIMPLICITY

TOWARD

INTEGRATION

AND AND

ABSTRACTNESS J COMPLEXITY;

SUMMARY

Bibliography

167

Index

173

I

The Meaning of Measurement

M

is the principal implement of science, changing that field of human endeavor from medieval gropings to a modern exactitude. The scientific thought of the Middle Ages was chiefly founded upon Aristotelian science, which was largely classificatory, consisting of enumeration and classification of qualities. Modern science, on the other hand, is essentially a science of control, its method being to analyze phenomena into their elements, to describe the elements quantitatively, and to establish relations among them. This methodology is clearly seen in the first significant experiments of modern science. For example, in the determination of the laws of falling bodies, Galileo reduced the phenomenon of falling bodies to the elements of time, distance, acceleration, and velocity. By the symbolic manipulation of these factors, he was able to formulate the laws of falling bodies, and then to establish them experimentally by measuring time and distance, and calculating relationships. EASUREMENT

In the experimental verification of the laws of falling bodies, Galileo measured time by a water clock and length by a convenient linear scale. At that time there were, in addition to these crude instruments, a few other important measuring devices such as the thermoscope, which was the beginning of our modern thermometer; balance scales for the measurement of weight; and wheel clocks for the measurement of time. The pendulum clock, which greatly improved the measurement of small intervals of time, the barometer, and Boyle's hydrometer had not yet been contrived. These instruments, however, were introduced in the seventeenth

2

Meaning of Measurement

century, shortly after Galileo's epoch-making experiments on falling bodies, and they led immediately to the refinement of observations, the formulation and verification of new laws, and finally to the further improvement and invention of measuring instruments. Each extension of science into new areas has brought about the development of new instruments of measurement or some new device to aid our senses. Indeed, the last three centuries have witnessed such a rapid development of instruments that today one can hardly think of a field of intellectual endeavor into which measurement has not crept, and surely there is none in which its influence has not been felt. Beginning with comparatively simple phenomena, such as falling bodies, measurement has advanced into such diverse fields as electricity, light, biology, medicine, sociology, psychology, politics, economics, and education ; and some people have claimed that its applicability is limited only by the capacity of man to devise techniques of adapting it to new areas. Such diverse problems as the determination of the most efficient conductor of electric current, the best antiseptic in the treatment of wounds, the most effective method of teaching subtraction, and the effect of age upon the ability to learn are attacked nowadays by means of measurement. Measurement has become so woven into the fabric of our culture that were we suddenly to lose this particular capacity, our whole social organization would collapse and almost overnight we would be thrown back to a primitive culture. Almost every aspect of our daily life is touched by measurement of one kind or another. Our daily work is scheduled by a watch; the rate of our travel is measured by a speedometer ; weather forecasts are made possible by barometers, thermometers, and telegraphic systems that depend upon delicate instruments which regulate and control the flow of electricity; food is bought by weight; electricity is paid for by the kilowatt hour; and the status of our health is ascertained

Meaning of Measurement

3

by employing a large number of measuring devices. In fact, the more complex and accelerated relationships become among individuals, the more necessary it is that they arise and develop with the aid of exact measurement. A savage has no need of exact time, but without watches we could hardly regulate our lives. Y e t so accustomed to measurement have we become that very few of us are even conscious of its significance, to say nothing of having even a superficial acquaintance with its fundamental principles. This lack of understanding of measurement is due in part to the fact that measurement, like the techniques of science in general, developed empirically. A s new problems arose and as science invaded new territory, instruments of measurement were devised to meet the peculiar needs that developed. There were, however, no general principles to guide the makers of instruments. It should not be supposed from this statement that instruments were not based upon general principles ; the principles were there, but they were not consciously recognized nor formulated. In a word, until comparatively recent times there was no rationale of measurement, just as there was no grammar of languages until long after their development. It is therefore only natural that measurement be so little understood : its general principles were not formulated, and the constructing of instruments was confined to specialists. During the last hundred years, however, scientists, logicians, and mathematicians have been slowly teasing out the thèoretical foundations of measurement. A s a result, the concept of measurement has become more and more clearly defined, so that it is now possible to set it forth without the entanglements of the experimental work actually involved in instrument construction. In the present chapter the theoretical foundations of measurement cannot be developed, but we shall make certain elementary distinctions between measurement and other functions which are occasionally confused with it. Furthermore, we shall call attention to the nature of

4

M e a n i n g of

Measurement

the basic ideas underlying measurement, and make clear the meaning of instruments and the dependence of measurement upon them. MEASUREMENT

AND

ENUMERATION

Measurement is inevitably tied up with numbers, and, as we shall see in a later chapter, is one of the ways in which numbers can be introduced into the treatment of problems. It is true that enumeration also is dependent upon numbers, and measurement cannot therefore be distinguished from enumeration on the assumption that one depends upon numbers while the other does not. W e enumerate objects and we measure them. W h a t is the difference ? Suppose we have a pile of boards and we wish to know how many boards there are in the pile. W e enumerate them and discover that there are thirty-two. Enumeration requires that some class of objects be defined. The definition does not require that the objects have identical properties nor that they possess the properties in the same degree. T h e boards in the pile may vary in color, temperature, length, weight, density, hardness, and so on, but they must be objects near enough alike to constitute a single class. Each board is then a separate and distinct unit of a general class of objects which we call boards. Thus by enumeration we can answer the question: " H o w many boards?" If we ask, however, whether one board is longer or heavier than another, obviously we have raised a problem that enumeration cannot answer. T o answer this question requires measurement, because enumeration cannot be used to describe the different degrees of a property such, for example, as lengths and weights of boards. W e can enumerate the boards, but we cannot enumerate their lengths and weights. It is precisely the variation of the properties which, ignored by enumeration, constitutes the area dealt with by measurement. Measurement is not concerned with a class of objects nor with the question of how many ob-

Meaning of Measurement

5

jects of a particular class are present at a given time and place. It aims to determine the amount of such properties as length, weight, temperature, intelligence, electrical resistance, achievement, illumination, and so forth, irrespective of the class of objects to which they may belong. It attempts to determine, for example, the temperature of various objects, and whether the objects be the human body, a vessel of water, or the atmosphere is irrelevant, so far as measurement is concerned. The variation of the amount of a property, such as temperature, is a continuous increase or decrease, as the case may be, and it does not take place by jumps. When an elevator, for example, moves from one floor to another it moves continuously through the intervening space, that is, it does not begin at the first floor, and jump to the second without traversing the distance between the floors. The elevator acquires the property of being higher or at a higher floor, not all at once, but by an infinite number of small amounts, so that we can think of the elevator as passing through an infinite number of small distances on its way from one floor to another. In a word, we are dealing here with a property which can be divided into segments, so that the sum of the segments can be taken as a continuous whole. It is the determination of different amounts of such continuous property that constitutes the problem of measurement. Not all measurable properties, of course, are continuous in the sense that they can be cut into segments, as with height and weight, but they are all continuous in the sense that each exists in different degrees in different objects. The ideas which we associate with measuring are therefore those which have to do with determining the amount of a property. Thus, while it is true that both enumeration and measurement involve numbers, they employ them in quite different senses. The first uses them to answer the question of how many; the second employs them to answer the question of how much.

6

Meaning of Measurement MEASUREMENT A N D

JUDGMENTS

We are constantly making quantitative judgments. W e say, for example, that this flower is more beautiful than another, this book is more scholarly than another book, this man is more friendly than other men, and so on. Is the process of making such judgments the same as measurement, differing from it only in accuracy? Or is there a more fundamental difference between these ordinary judgments and measurement ? Measurement is a way of automatically rendering judgments which would otherwise require mature experience and deliberation, and it is this automatic factor which distinguishes measurement from ordinary observations. In rendering a judgment without measurement, a person takes certain things into his experience and weighs them against one another. H e analyzes them and reconstructs them into a pattern which he expresses as his judgment. Thus, for example, a superintendent of schools in selecting a teacher studies available facts about the various applicants, such as their ages, teaching experience, training, health, and past success. The superintendent will weigh these factors in the light of his experience, and after much deliberation will render a judgment that discriminates among the applicants, some being judged more suitable than others ; and the richer and more extended his experience, the more likely he is to make a wise selection among those who are applying for the position. The way in which judgments are rendered automatic by measurement can be seen if we consider how teachers might be selected by an instrument which exactly determined teaching ability. If such an instrument existed, the selection of teachers would then no longer depend upon the experience of the superintendent. The instrument could be applied by any novice, and the resulting selection of teachers would be satisfactory. For measurement involves a standard against

M e a n i n g of Measurement

7

which to evaluate any property or characteristic like the one upon which the standard is based. This standard is an external and public thing which, when it is applied, commands more or less universal agreement as to the findings, and, being external and public, it can be manipulated by anyone in possession of a minimum of experience. Even a child can read a speedometer, but only an experienced motorist can judge the velocity of a car without the aid of instruments. W e may thus sum up the distinction by saying that a judgment is an intellectual process that depends upon the present sensitivity and the past experience of the individual. Measurement, on the other hand, in its most elementary sense is a matter of properly manipulating some physical instrument from which readings are recorded. In a fundamental sense, as we shall see, it involves far more than this ; it involves the actual construction of the instruments which we use in practical work. In a figurative sense, an instrument of measurement is a set of frozen judgments, for in the process of construction many judgments have been incorporated into the structure of the instrument itself, so that those who use it get the benefit of the judgments without actually having to make or discover them. MEASUREMENT

AND

INSTRUMENTS

Measurement is so dependent upon instruments that it might even be defined as a method of constructing instruments. If we would gain an insight into the meaning of measurement, some idea of the nature of instruments is therefore essential at the outset. Perhaps the simplest way to demonstrate the nature of instruments is to show how we deal with things without them. Let us take a specific case. Suppose we have no instrument with which to measure the electrical resistance of materials, that is, their capacity to resist the flow of electric current; how can we deal with this property? W e might follow the

8

M e a n i n g of Measurement

procedure which Cavendish is reported to have devised in the eighteenth century. H e detected the resistance of different materials by comparing the shocks which he received from them when he connected them successively to some known source of electric current at one end and to his own arm at the other. T h e material having the least resistance was, of course, the one that produced the greatest shock. B y following this procedure, we could obtain a general consensus as to the electrical resistance of materials; we would be dealing with the property directly, evaluating it in terms of our direct experience with electrical phenomena. When we use an instrument to determine electrical resistance, however, we no longer deal with the immediate property but with its effect upon the physical instrument, that is, we understand and evaluate the property through the behavior of an instrument, rather than through our own experience. Thus, for example, if we insert a Wheatstone bridge into an electric circuit, we can see what resistance a material has by making certain objective observations. W h e n this process is followed, we have no direct experience of the electric current at all. W e do not feel an electric shock, nor do we feel the flow of electricity nor the resistance of the materials. In a word, the direct experience of electricity, as such, has been supplanted by a direct experience of the behavior of a physical apparatus, and this behavior is attributed to something which lies behind it, and to which we refer as electricity. From what has been said, it is obvious that an instrument is something that connects an observer with a phenomenon, and we çan therefore say, with Whitehead, that "an instrument can be regarded as a chain of related parts, stretching from some object to an observer." 1 Our knowledge of objects is largely, if not completely, limited to the effects which they produce ; and the more instruments which 1 T . N. Whitehead, The Design and Use of Instruments and Accurate Mechanism, p. 2, by permission of The Macmillan Company, publishers.

Meaning of Measurement

9

can be devised for detecting and describing these effects, the more precise is our knowledge. Indeed, the control which man has been able to exercise over nature is largely due to his capacity to substitute the manipulation and observation of instruments for his direct experience of objects. Instruments, however, are not all alike, even in their fundamental principles. There are two general classes : first, those that extend and increase the power of our senses ; and, second, those that enable us to render more precise quantitative descriptions of properties. The first class of instruments include such devices as the telescope, X-ray, microscope, telephone, radiophone, and dentiphone. These devices simply extend the range and capacity of our senses, enlarging and enriching our experience by bringing a larger portion of the world into it. Instruments of the second class are those which are ordinarily called instruments of measurement. They include such instruments as meter sticks, thermometers, voltmeters, radiometers, chronometers, achievement tests, and psychological examinations. While these two classes of instruments are distinct, it is true nevertheless that instruments of both types may occasionally be combined to form one instrument, such as, for example, the theodolite, which consists of a telescope, together with devices for measuring angles, and used to determine distances and heights. For our purposes it is important that the second class of instruments, that is, the instruments of measurement, be understood clearly. A thorough insight into their fundamental nature cannot be developed in the present chapter, but certain general characteristics will be pointed out in the next few pages. Before going on, however, it is important to note that devices, such as the rating scales developed in education and psychology, should not be confused with instruments of measurement. Rating scales, valuable as they are for some purposes, cannot be classed as measuring instruments, for in the last analysis such scales are only devices for recording the

io

Meaning of Measurement

judgments of observers. As has already been observed, an instrument of measurement is a physical thing that connects an observer with some phenomenon. The behavior of the instrument gives the observer an idea of the phenomenon's quantitative aspects. The instrument is therefore not something upon which judgments are recorded; but this is precisely what a rating scale is. We rate the socio-economic status of people, for example, with a scale devised for that purpose, but the rating is only the direct judgment of observers and will therefore vary with the range and depth of each observer's experience. GENERAL CHARACTERISTICS OF

INSTRUMENTS

While all instruments of measurement are different in their more or less obvious features, they are nevertheless based upon the same general ideas. Thus far we have been treating measurement as though it were limited to the application of some instrument. As a matter of fact, however, measurement involves much more than the mere use of an instrument, for it also includes the process of construction. Behind the practical use of any instrument of measurement, there is a long trail of experimental steps leading up to its completion. Hence the practical use of an instrument takes into account only a small part of what is actually implied in any operation of quantitative evaluation. If we go behind the scenes of instrument construction, into its theoretical aspects, we shall find that all instruments of measurement, when analyzed, show certain general characteristics. There are first of all several broad and sweeping conceptions, out of which arise the notion that a particular property is subject to measurement. These conceptions lie behind the work of the instrument-maker and constitute the frame of conventions within which he formulates experimental procedures and constructs instruments of measurement. They

Meaning of Measurement

n

relate to the scheme of nature itself and to the character and place of properties, as they exist within this scheme of things. In short, the objective exploration of any subject matter must always wait upon the development of appropriate patterns of thought, and upon their organization into a framework of guiding conceptions and perspectives. This is clearly seen in educational measurement, for the introduction of measurement into the field of education awaited and depended upon the rise and development of conceptions that brought mind within the realm of natural events. Since the Middle Ages, mind had been conceived as something imported from outside the domain of nature. It was something spiritual and hence not subject to law and order. So long as this view prevailed, there was little incentive to study mind by means of measurement, and the formulation of experimental procedures by which to study its nature lacked guiding ideas. It was therefore necessary that the supernatural conceptions of mind be supplanted by a naturalistic view, before measurement could make much headway in the field of education. Another general characteristic of measurement, and likewise of instruments, is that it is based on preconceived abstract conditions. By abstract we mean that they are formal, that they hold good for any field into which measurement may be extended successfully, and that they therefore presuppose no particular experimental procedures in order to be satisfied. In order to be completely measurable, a property must possess certain characteristics : first, it must be capable of being described as more or less in amount; second, it must be such that a property of like kind can be found equal to it ; and third, it must be capable of addition. These characteristics must conform to certain axiomatic conditions. For example, the addition of two lengths must yield a length greater than either of the lengths taken separately. These preconceived

12

Meaning of Measurement

conditions of measurement, when properly formulated, constitute its logical foundations, and all instruments of measurement must conform to these presuppositions. EDUCATIONAL

MEASUREMENT

In the light of the foregoing discussion, it should now be obvious that measurement is a very definite concept and that the term can be used only in a particular sense. It is the quantitative evaluation of a property by means of an instrument which is constructed in accordance with certain general principles. These principles have been formulated only within comparatively recent years. This formulation is a very significant achievement, however, for these principles permit an examination of measurement in any field of study, from the standpoint of whether or not the fundamentals of measurement have been adhered to. Such an examination is especially desirable when the results of experimental work continue to be disappointing, which is precisely the condition that now exists in the field of educational study. The method of experimental science has been applied to educational problems for almost three decades, and the results are far from what might reasonably have been anticipated. In far too many instances the results of experimental studies are conflicting and unconvincing, indicating that the sterility of much of educational research is perhaps due not to lack of precaution on the part of the experimenters but to something more basic and subtle. The use of measurement in the study of education has been the occasion of controversy and the source of much confusion on the part of both the defenders of measurement and its critics. Educational measurers have proceeded, in some cases, as though numbers in themselves had some inherent power to solve problems. Individuals having the same scores on a test have been treated as though they were equal in respect to whatever the scores represent. It has been claimed that equal

M e a n i n g of Measurement

13

units of intelligence, achievement, merit of composition, merit of handwriting, and so on, have been obtained. A n d the readiness with which w e add scores representing different things must arouse in the mind of the critical observer a feeling somewhat akin to that which A l i c e must have felt when the W h i t e Queen asked, " C a n y o u do D i v i s i o n ? D i v i d e a loaf b y a k n i f e — W h a t ' s the answer to

that?"

T h e present study is attempting to t h r o w light on educational measurement f r o m the standpoint of the general ideas that underlie the construction of its instruments. M o r e specifically, an attempt is made to set f o r t h the thought patterns that gave rise to an objective study of learning and its outcomes. T h e s e ideas are advanced in Chapter II, and the general types of instruments based upon them are described in Chapter I I I . T h e n in Chapter I V the logical foundations, or general principles, underlying instrument construction

are

set forth, as a basis f o r the criticism contained in the remaining chapters.

II

Intellectual Antecedents of Educational Measurement HE designing of procedures and techniques to reduce the products of instruction to the logical conditions of measurement is largely a development of the last twenty-five years, but despite its youth, its roots are embedded deep in the intellectual life of the nineteenth century. A m o n g the many channels into which the thought of this prolific century flowed there are four whose convergence led to the emergence of educational measurement: first, the expansion of physics into chemistry, biology, and psychology; second, the introduction of the dynamics of organic forms by Darwin ; third, the application and expansion of mathematical statistics to the study of living things ; fourth, the rise of precision psychology. T h e function of this chapter is to examine these four lines of intellectual development at the points where they bear most significantly on the rise of educational measurement. THE EXPANSION OF PHYSICS

The quantitative conquest of nature became conscious of itself in the mechanics of Galileo ( 1 5 6 4 - 1 6 4 2 ) and Newton ( 1 6 4 2 - 1 7 2 7 ) . 1 Gradually mechanics expanded into physics until the latter became such an inclusive discipline that the story of modern science is largely contained in the subsumption of other sciences under physics. During the eighteenth 1 For the material in this chapter I have depended to a large extent upon the extensive work of Merz, History of European Thought in the Nineteenth Century; Mead, Movement of Thought in the Nineteenth Century; Boring, A History of Experimental Psychology ; and Murphy, An Historical Introduction to Modern Psychology.

Antecedents of Measurement

15

and nineteenth centuries the methods and basic concepts of mechanics came to be extended to the study of elasticity, hydrodynamics, electricity, heat, and other branches of physics. It became increasingly evident that formal methods of mechanics were applicable to these quite different aspects of the physical world and that a mathematical operation in one branch or aspect yielded results when applied in another. Thus physics built up a kind of internal consistency and unity such that one problem could be cleared up by expressing it in terms of another problem. Just as physical research gradually drew the many branches of physical inquiry into the fold of one methodological design, tending to reduce them ultimately to a few fundamental principles, so the fields of research which qualitatively lie wholly outside of physics came to be seen as basically physical, subject to the same kind of treatment as that meted out in Newtonian mechanics. For example, in the closing years of the eighteenth century Lavoisier ( 1 7 4 3 - 9 4 ) , working on the assumption of the constancy of mass that had guided Newton in his study of mechanics, repeated the chemical experiments of Priestley ( 1 7 3 3 - 1 8 0 4 ) and Cavendish ( 1 7 3 1 - 1 8 1 0 ) . Carefully weighing the materials with which he worked, he discovered that in chemical reactions matter may change its form but remains unaltered in amount, and in this way he linked chemistry with the principles of mechanics and gave further scientific credence to the belief in the ultimacy of matter. From this time chemistry came more and more to appear as a branch of physics, until today the line between the two is increasingly tenuous and indistinct. Building upon the work of Lavoisier, Dalton ( 1 7 6 6 - 1 8 4 4 ) , in the first decade of the nineteenth century, succeeded at last in elevating the ancient atomic hypothesis to the status of an accepted scientific theory. Advancing the theory that the weights of elements, as they combine in compounds, give their relative atomic weights, he rendered the atoms capable of more rig-

16

Antecedents of Measurement

orous description than ever before. In the same decade water was decomposed by means of a voltaic cell, and thereby a definite connection was established between chemistry and electricity. The concordant relation between chemistry and physics, which these few fundamental examples demonstrate, was further substantiated and extended in the decades that followed. Subsequent research has served to bring these two apparently diverse subject matters within a single methodology, until, in our own time, the reduction of atoms to electrical units has brought chemistry completely within the fold of physics, further emphasizing the basic unity of nature, which the thought of the nineteenth century more or less tacitly assumed. This engulfing expansion of physics into the world of chemistry, startling as it was, still dealt with the physical, inorganic world. There was another realm, the organic world. In this realm were products and processes which could not be duplicated by physical means—vital processes whose products it was considered futile for science to attempt to obtain synthetically. It might be all right for science to make excursions into the physical world and return with laws descriptive thereof; but certainly the organic world could not be reduced to the gross materials and laws of the physical. Thus it was that the limitations of science were expressed when the expanding science of physics reached the nineteenth century. Before the nineteenth century Galvani ( 1 7 3 7 - 9 8 ) had already discovered that an electric current was generated by the sciatic nerve of a frog, and in 1801 Wollaston ( 1 7 6 6 1828) showed that the electrical effect of this nerve was identical to that furnished by a voltaic cell. A little more than twenty-five years later Wohler ( 1 8 0 0 - 8 2 ) succeeded in preparing urea, an organic compound, out of inorganic materials. Other fabrications of organic compounds continued until in 1887 Fischer ( 1 8 5 2 - 1 9 1 9 ) built up fructose and glucose

Antecedents of Measurement

17

from their elements. Much earlier, in 1840, Schwann ( 1 8 1 0 82) reached the conclusion that the body was composed of minute particles called cells. If we add to these disclosures the further fact that the chemical analysis of organisms showed them to be reducible to chemical elements, the bulk of which were carbon, hydrogen, oxygen, and nitrogen, there was little room for doubting that the organic world was a physicochemical world. Alas ! Even the organic realm was dissolving into the gross materials of physics, and the principles and methods of Newtonian mechanics were invading the territory of vital materials and processes. In the light of these disclosures, and the concordant relation of physics and chemistry, the nineteenth century had scientific validation of the idea long before advanced by Descartes ( 1 5 9 6 - 1 6 5 0 ) when he wrote, " I conceive that the body is nothing else than a statue or machine made of clay." But Descartes had reservations. Mind, he thought, could not be accounted for on a naturalistic basis. He thus gave credence to the dualism that had dominated the life and thought of the Middle Ages : the body was physical and ephemeral, the soul was spiritual and eternal. A s in the century and a half that ran immediately before it, nineteenthcentury thought continued to waver between theories that attempted to bridge the gap between mind and body and those that removed the dichotomy by denying one or the other. There was no firm and consistent answer to the problem, but just as organic phenomena had been the unconquerable frontier before the successful expansion of physics into the realm of biology, so now the conscious inner life became a new boundary setting the limits of mechanical science. Until the advent of behaviorism in the twentieth century, progress toward the subjugation of this conscious inner life to quantitative methods was only partial. The major step in this direction during the nineteenth century was made by the physicist, metaphysician, and student of mathematics, Gustave

18

Antecedents of Measurement

Theodore Fechner ( 1 8 0 1 - 8 7 ) . Concerned as we are with the expansion of physics into the field of psychology, a knowledge of the educational background of Fechner will help us to understand his contribution. At the age of twenty-one Fechner had completed his doctorate in medicine at Leipzig, but, finding mathematics more to his liking, he continued his work in that science rather than in medicine. He maintained himself, meanwhile, by translating books on physics and chemistry, gaining some reputation in those fields. Finally he became professor of physics at Leipzig, in which position he carried on experimental work, publishing a paper on the measurement of galvanic cells which made him known among the physicists of his time. On account of ill health, however, he was forced to resign his professorship soon after he had acquired it. It was during this period of ill health that he turned to metaphysical speculations, partly as a result of his remarkable recovery and partly in protest against the materialistic philosophy of his day. Out of these speculations grew his experimental work in psychology, a report of which constitutes the content of his treatise on psychophysics published in i860. Out of this experience and training, Fechner brought to the study of psychology the quantitative and experimental spirit of physics, the rigorous and systematic methods of mathematics, and an extensive knowledge of the human organism. Interesting and significant as is the fact that a physicist and mathematician was the first to conduct an extensive piece of experimental research in psychology, it is exactly what might have been expected, for only one already disciplined in the methods of mechanics and mathematics can adapt them to new subject matter. In this well-equipped scientist physics found an entree to psychology, and, while it was not immediately successful in invading all aspects of that field, it made through him the first irrepressible attack. Herbart ( 1 7 7 6 - 1 8 4 1 ) , as early as 1 8 1 1 , had introduced

Antecedents of Measurement

19

the term limen or threshold, meaning that a liminal stimulus was one which had just enough magnitude to lift an idea over the threshold of consciousness. That is to say, unless a stimulus is large enough, the idea which it effects will not enter consciousness. Then, in the 1830's, E . H. Weber ( 1 7 9 5 1878) made some experiments on sense discrimination in touch and in sight. In 1834 he advanced in a rough form the generalization that discrimination between stimuli depends upon the relative magnitude of their differences, not upon their absolute differences. This generalization was the first quantitative uniformity established in psychology. It was the contribution of Fechner to give a more rigorous experimental and mathematical treatment to this whole problem of the relation of stimulus magnitude to changes in sensation. The hypothesis upon which his experimental work rested was that, although sensations cannot be quantitatively described, the stimuli which effect them can be so described.! Then, as the stimuli and the sensations are functionally related, a measure of stimuli is indirectly a measure of sensations, much in the same sense as the expansion of a thread of mercury is taken as a measure of temperature. The meaning of this hypothesis may be more clearly seen by an illustration. Consider two sources of brightness, one so dim that it can barely be detected, the other so bright that the eye can hardly bear it. These sources would constitute the terminals of a scale of brightness intensities. Correlative to this scale is a scale of brightness sensations, the point at which the dimmest light is just noticeable constituting one end of the scale and the point at which the eye can hardly bear the source being the upper limit. Between the extremities of the scale of brightness intensities there may be placed other sources of brightness of graduated intensity. Now, if proper conditions of control be exercised, it should be possible to find sources that effect a just-noticeable increase in the correlative sensation scale. These just-noticeable increases would constitute the

20

Antecedents of Measurement

units of a sensation scale. A series of brightness intensities selected on the criterion of effecting just-noticeable differences in sensation would thus be an indirect measure of brightness sensation. This is essentially the pattern of psychophysical measurement which Fechner developed out of his knowledge of physics, mathematics, and experimental techniques and procedures in a decade of laborious years from 1850 to i860. The work of Fechner is important not only because it marked distinctly the expansion of physics and mathematics into psychology, but also because it laid down the pattern by which one phase of educational measurement has developed. Following the publication of Fechner's experimental studies, many similar studies were conducted, confirming and reconstructing his conclusions and extending his method into new contexts. Moreover, as his basic method was applied in new territory, new techniques and procedures were invented. In the closing decade of the century two Americans, Fullerton ( 1 8 5 9 - 1 9 2 5 ) and Cattell 2 ( i 8 6 0 ) , performed elaborate experiments designed from the pattern laid down by Fechner, resulting in a clarification of moot points and advancing the basis of the theory that differences noted equally often are equal. Upon this theory Thorndike 3 ( 1 8 7 4 ) built his handwriting and drawing scales in the first part of our own century, setting the pattern of what has come to be called quality scales in educational measurement. Despite this application of the methods of physics to some branches of psychology, the realm of inner life still did not surrender to mechanics. A f t e r all, in the thought of the century Fechner had merely measured the relation between physical stimuli and sensations. There was still the world of private consciousness, reflective thought, and affection which the determination of stimulus thresholds and measures of sen2 3

Fullerton and Cattell, On the Perception of Small Thorndike, "Handwriting."

Differences.

Antecedents of Measurement

21

sations did not seriously invade. Consciousness, mind, soul, could not yet be sacrificed at the altar of physics. "THE

DYNAMICS

OF

ORGANIC

FORMS"

A t the time that physics was expanding into psychology, there was emerging in quite a different field of intellectual endeavor a new conception of nature, a conception which was ultimately to interpret the inner world of experience mechanically; and, mechanically interpreted, the inner world, as well as the inorganic and organic worlds, became subject to quantitative description. This was the evolutionary conception prepared by Lamarck ( 1 7 4 4 - 1 8 2 9 ) , Spencer ( 1 8 2 0 - 1 9 0 3 ) , and others, and centered in Charles Darwin ( 1809-82 ). Darwin's Origin of Species, which came from the press in 1859, and Fechner's Elements of Psychophysics, which followed it the next year, taken together, mark the turning point in the methodology of studying human nature. T h e era that followed these two great works saw the definite advent of quantitative and mechanical methods in the exploration and description of the subject matter of biology, anthropology, psychology, and their related disciplines. The concept of evolution was by no means new, even in Darwin's time. It was an old idea, dating back to Greek thought, but Darwin succeeded in elevating it from the level of speculation to the status of scientific theory, supported by an elaborate collection of data. Before Darwin's epochal generalization was advanced, however, nature was accepted more or less as a fixed and unalterable scheme. Newton's mechanics and its expansion into other branches of physics, and finally into chemical, organic, and psychological phenomena, gave no explanation of how the forms with which it dealt came into existence. Its methods limited it to the exploration and description of the relations existing among the forms, by analysis, control, measurement, and mathematical calculation.

22

Antecedents of Measurement

T h e design of the world which Newtonian mechanics assumed was one of motion among immutable forms laid down in the beginning. A counterpart of this static conception of nature is seen in the theory of ideal forms in biological studies. This study of ideal forms was the prevailing approach of morphologists before Darwin. A species was the object of study only when represented by a true form of that species. Variations in a species were more or less pushed aside as a "dust-cloud of exceptional observations," on the theory that nature in struggling to produce a perfect form must have fallen short of its mark, letting loose in the world many individuals who unfortunately varied from the ideal type. Variations, were, so to speak, mistakes of nature. They were not considered significant, because they did not fit into the frame of conventions that encircled the intellectual speculations of the days before Darwin. Before Darwin's theory had been advanced, variations had of course been studied, notably by Quetelet ( 1 7 9 6 - 1 8 7 4 ) , one of the first social statisticians. H e pointed to the extreme variations from the mean or ideal form as phenomena worthy of study; but the fact of variation did not stand out as a significant factor in his thinking, because the concept of uniformity and the metaphysical conception of ideal forms overshadowed the idea of variation. 4 It was Darwin's contribution to have enlivened the thought of his century, by showing that the organic forms of nature were mutable and that the process of their change was subject to naturalistic interpretation. Darwin's point of departure was precisely these differences or variations which had been neglected. T o him variations, instead of being the mistakes of nature, were the very principle on which nature operated in reconstructing her forms. 4 T h i s is implicit in Quetelet's idea of the "average moyen."

man"—"l'homme

Antecedents of Measurement

23

There were two significant and far-reaching implications of his theory which fired the imaginations and directed the energies of his scientific disciples : first, the quantitative theory implicit in his conception of variation; and, second, the mechanical interpretation of human behavior implicit in his theory of natural selection. Variation and natural selection accounted for the dynamics of forms, that is, for the modifications or changes which take place in all organic species. It was the quantitative and mechanical implications of this theory of variation and selection that supplied the intellectual tools, which, under Francis Galton ( 1 8 2 2 - 1 9 1 1 ) , were used to explore almost every aspect of human nature. Variation was thus one of the most significant conceptions that came out of the studies of Darwin. For educational measurement, it is the most far-reaching and basic of all the conceptions which the last century bequeathed to our times. While the mechanical interpretation of evolution, as we shall presently see, placed the naturalistic stamp upon conscious life and opened it up to exploration by the methods of mechanics, the concept of variation went still further and effectively introduced the idea of quantity. In the last analysis, all educational measurement and studies of individual differences rest upon the fact that individuals do vary from one another. As we shall see, this fact has made it possible for the mathematics of statistics to extend into the study of educational problems. The conception of variation carries with it the idea that whatever it is that varies must exist in different amounts. In considering any quality of the organism, the whole species therefore becomes the unit, instead of the individual, as was the case when the spell of ideal forms held scientific thought captive. When the species is the unit, it is significant to know how much and in what direction one individual of the species differs from another. An answer to this problem can be adequately framed only by recourse to measurement and calcula-

24

Antecedents of Measurement

tion. This is the conclusion which was reached by the disciples of Darwin, notably Francis Galton, 5 and the schools of biometry and anthropometry which he did so much to establish. Among the individuals of these schools, variation was such an intriguing idea that much of their intellectual labor was given to ascertaining, tabulating, classifying, and relating data bearing upon a host of characteristics such as heights, weights, and natural abilities of individuals. Like a child who wants to keep repeating a trick he has just learned, students of variation seemed tirelessly to persist at this self-appointed task. The methods and spirit of biometry and anthropometry filtered through to educational study at the turn of the nineteenth century and helped mold the quantitative approach to educational problems. Let us turn now to a consideration of the theory of natural selection as it bears on a mechanical view of the conscious life. Before we turn to the main point of consideration, it is well to recall that the advance of physics had practically reached its limits with the psychophysical method of Fechner. This method, properly speaking, left most mental events in a realm outside the subject matter of mechanics, in the supernatural category where Descartes had placed them. Such a theory naturally tended to paralyze efforts to study mental processes objectively. It was at this point in the mechanical conquest of the mental life that Darwin's theory of natural selection entered. The theory of natural selection emphasized the relation of the organism to its environment. As the character of the environment shifted, the organism had to become modified so as to adjust itself to the new conditions, or it had to perish. Implicit in this view of the relation of the organism to its environment is the idea that natural forces, such as climatic and geological changes, have from time to time in the history of the world produced new environments to which the or0

Galton, Hereditary

Genius.

Antecedents of Measurement



ganic forms either adjusted themselves or failed to survive. The transformation of species was therefore mechanical, in the sense that it was instigated and validated by conditions that lay outside the organisms themselves. These conditions were natural, and the theory of natural selection thus brought the organic forms within the processes of nature. Therefore, since natural processes were themselves already held to be mechanical, it was only a logical deduction that the transformation of species should likewise be of that character. Whatever an organism had been, whatever it had come to be, and whatever it was destined to be in the future, was by implication written in environmental changes, past, present, and yet to be. It was not a long step from this deduction to the idea that mind and the conscious life had come into existence through natural forces and that they were subject to natural laws. For the first time in the history of modern thought, the evidence and rigorous methods of science supported the conclusion that mind was capable of reduction to the level of the materials of Newtonian mechanics and subject to the same formal methods of investigation. At last the mental processes had been taken captive by the weapons they had forged. The conception of adjustment came to be very popular in the study of psychological phenomena. If Darwin advanced a theory of dynamics of forms, he also assumed a changing environment to account for the change in forms. Students of psychology saw in this explanation the foundations of mental processes. They saw that the organic form, in meeting new conditions, reacts to the environment in such a way as to bring about a new adjustment of the organism, and that action was therefore a primary factor in the process of adaptation. Given environmental changes, and adjustments of the organism to these changes, human behavior could be reduced to the pattern of the Newtonian principle of action and reaction. On this hypothesis psychologists came more and more, in the closing years of the last century, to express con-



Antecedents of Measurement

sciousness in terms of motor action. In our own day the ultrabehaviorist takes the position that mind, consciousness, self, and other such introspective terms have no place in a science of psychology, and even for the less extreme behaviorists the proper datum of psychological study is behavior. T h e doctrine that psychology is concerned with the behavior of the organism is thus a direct deduction from the idea of the survival of the species. The impact of the Darwinian theory of selection is seen in the pioneer work of Thorndike on animal intelligence. In 1872 Darwin published his study of the expression of emotions in man and animals. 6 Following this publication there grew up an interest in comparative psychology, of which Thorndike's pioneer experiments on animal learning were a manifestation. The point that interests us here is not the facts which Thorndike collected on animal learning, but rather the theory which he adopted to explain his facts. His explanation was that of trial and error, of learning reduced to mechanical terms. In this respect it was identical with that of the theory of natural selection. His experiments were simply miniatures of the vast operations involved in the suppression and survival of species, carried on in the laboratory of nature. T h e stimulus-response hypothesis which Thorndike used, destined to become the ruling psychological concept of educational thought in the first quarter of our century, is a mixture of Newtonian mechanics and the concept of natural selection by mechanical processes. It is perhaps too much to say that Thorndike, or, for that matter, any psychologist of the late nineteenth century, always consciously thought out his concepts and principles in keeping with the theory of natural selection. There were many other concepts, theories, and principles of those days which impinged upon the sensitive mind. But the doctrine of evolution was no doubt the most pervasive and stimulating. β

Darwin, The Expression

of the Emotions in Man and Animals.

Antecedents of Measurement

27

It was hardly necessary f o r one to take it consciously into account. In this connection it is significant to find the following statement f r o m the pen of T h o r n d i k e . W r i t i n g in 1909, he said, To all human thinking and conduct Darwin taught two great principles. T h e first is the principle of evolution, of continuity —that each succeeding segment of the stream of life, each successive act in the world drama, is the outcome of all that have gone before and the cause of all that are to come. The second is the principle of naturalism,—that in life and in mind the same cause will always produce the same effect, that the bodies and minds of men are a part of nature, that their history is as natural as the history of the stars, their behavior as natural as the behavior of an atom of hydrogen. 7 A t the close o f the century the psychological world w a s rapidly losing faith in the idea of mind as something supernatural, and the old dualism between mind and body w a s dissolving, m a k i n g w a y f o r the advent of a physics of behavior. T h e dynamics of f o r m s advanced by D a r w i n in time came to be shot through and through with the analytical, quantitative, and mechanical conceptions of physics. T h e w a y was thus prepared f o r the application of mathematics to the properties and traits o f living things. It w a s a triumph o f mechanics. THE APPLICATION

AND EXTENSION OF

MATHEMATICAL

STATISTICS

Since N e w t o n ' s time mathematics had increasingly become an indispensable tool of science, so that, by the early part of the nineteenth century, science had come to be closely identified with mathematical rigor and precision. A s the method of Newtonian mechanics persistently extended itself,

mathe-

matics accompanied it as its chief instrument of conquest. First there had been the branches of physics, then the chemi7

Thorndike, "Darwin's Contribution to Psychology," p. 79.

28

Antecedents of Measurement

cal and organic world, and finally, under Fechner and his successors, sensations, each of which had wholly or in part shown itself amenable to the rigorous treatment of mathematics. A t the very beginning of the century, Herbart had conceived of a mechanics of mind based upon the conceptions of equilibrium of motions and composition of forces, as applied to ideas. H e attempted to work out these ideas mathematically, 8 but, lacking instruments of measurement, his results were empty symbols, a standing warning to those who would apply mathematics to a subject matter without reliable data. A f t e r Herbart, others were more successful. Fechner had used the instrument of mathematics in the study of sensations with remarkable success, and, as we shall presently see, in the closing quarter of the century, mathematics was improved so as to safeguard and advance the quantitative and mechanical exploration of the realm of life and mind. Darwin was in no sense a mathematician and even found it difficult to think at all in the mathematics of probability. 9 Nevertheless, his conceptions, as we have seen, were basically quantitative. O n this point Galton's remarks are very pertinent. " T h e first condition necessary," says Galton, "in order that any process of Natural Selection may begin among a race, or species, is the existence of differences among its members ; and the first step in an inquiry into the possible effects of a selective process upon any character of a race must be an estimate of the frequency with which the various kinds of individuals composing the race occur." 10 In the mathematical treatment of data the followers of Darwin found it necessary to borrow their basic mathematiMerz, History of European Thought in the Nineteenth Century, II, 498. In a letter to Lubbock he wrote, "You have done me the greatest possible service in helping me to clarify my brains. If I am as muzzy on all subjects as I am on proportion and chance,—what a book I shall produce!" Life and Letters of Charles Darwin, edited by Francis Darwin, I, 461, by permission of D. Appleton-Century Company, publishers. 1 3 Galton, "Scope of Biometrika," pp. 1-2. 8 9

Antecedents of Measurement

29

cal structure, and the necessary foundations were found in the mathematics of probability. During the eighteenth century probability had received the attention of mathematicians, largely at the stimulation of professional gamblers. A s early as 1 7 3 3 De Moivre ( 1 6 6 7 - 1 7 5 4 ) , working on the theory of chance, had formulated a mathematical statement of the normal law, but it was lost in the decades that followed and had to be rediscovered by Laplace ( 1 7 4 9 - 1 8 2 7 ) and by Gauss ( 1 7 7 7 - 1 8 5 5 ) . During the eighteenth century little attention had been given to the mathematics of probability, as an instrument f o r the treatment of data in the fields of social relationships and human properties and traits. In the closing decade of the century, however, Maskelyne ( 1 7 3 2 - 1 8 1 1 ) , the British Astronomer-Royal, noticed that his assistant observed the time of the stellar transit to be almost a second later than he did. His assistant, who, f o r a period of almost two years, had been growing more and more lax in this respect, was unable to improve, even under the prompting of Maskelyne. This led to his dismissal, because in astronomical observations a tenth of a second is an error over which to be concerned. Bessel ( 1 7 8 4 - 1 8 4 6 ) , a German astronomer, came across an account of Maskelyne's experience in 1 8 1 6 , and, becoming much interested in the problem, undertook a long line of comparative observations, in which he attempted to account f o r the errors of Maskelyne's assistant. F r o m these studies he discovered the "personal equation" and also its variability. F r o m such studies the mathematical theory of errors found objective validation, and in the first f e w decades of the nineteenth century a wide interest was evinced in the theory of errors, as applied to astronomical and physical observations. A s we shall presently see, these investigations of observations led to the study of reaction time, which came to be one of the most important problems of psychology. It is to Quetelet, however, that the world is indebted f o r the first clear insight into the application of the normal prob-

30

Antecedents of

Measurement

ability law to human traits and social phenomena. It is of course true that social data had been collected and tabulated before Quetelet, even so far back as biblical times, but Quetelet was the first to grasp clearly the fact that the errors of observation in the physical sciences and the variation of human traits were due alike to chance causes, and that they may therefore be described by the same mathematical formula. A s early as 1831 Quetelet had made studies on the relation between criminal tendencies and such factors as education, sex, age, climate, and seasonal changes. He was a pioneer in the field of anthropomorphic measurements, and had a farreaching influence on the study of growth and stature and the measurement of physical traits of school children. 11 He held firmly to the belief that mental and moral qualities would, when sufficient data had been accumulated, be found to be distributed according to the normal law of error. H a d he lived until our time, he would perhaps have felt that his anticipations had been fulfilled. In Quetelet's work, too, are the sources of Galton's mathematical methods of treating the data of living creatures. Galton refers frequently to Quetelet in the most complimentary terms. In his first extended treatise on heredity, published just ten years after the appearance of Darwin's Origin of Species, is a paragraph that indicates to some extent his indebtedness to Quetelet. A f t e r pointing out his dependence upon the law of "deviation from an average," as he termed the normal law, he says, The law is an exceedingly general one. M. Quetelet, the Astronomer-Royal of Belgium, and the greatest authority on vital and social statistics, has largely used it in his inquiries. He has also constructed numerical tables, by which the necessary calculations can be easily made, whenever it is desired to have recourse to the law. Those who wish to learn more than I have space to relate, should consult his work, which is a very read11

Walker, Studies in the History of Statistical Method, p. 41.

Antecedents of Measurement

31

able octavo volume, and deserves to be far better known to statisticians than it appears to be.12 The basic contribution of Quetelet of which Galton made great use was the application of the normal law to human properties and relationships. We have already indicated the quantitative implications of the concept of variation which Darwin advanced in his dynamics of forms. Variations have reference to differences among individuals, in respect to some quality or property. Teachers of biology used to take special delight in pointing out to their students that no two things in nature were precisely alike—an idea that called up in most of the students the thrill of the mysteries of nature. A n d yet this very mysteriousness was found to be orderly and subject to formal description. Galton, disciple of Darwin and student of the statistical works of Quetelet, saw that the law of errors was applicable to variations of organic forms. H e saw the structural identity between the distribution of variation of the forms and the distribution of errors, and he made use of this identity. On it he based his study of genius as a hereditary character, advancing for the first time objective evidence in support of the hypothesis of normal distribution of mental ability, and, on the basis of the law of errors, setting forth the idea of equal units or degrees of merit of that character. Thus by way of a study of inheritance, it came about that the second measure of mathematical rigor was introduced into the study of psychological phenomena, the first having been introduced by Fechner. Galton did not rest content with the application of the law of errors or the "very curious theoretical law of 'deviation from an average,' " as he characterized it. In dealing with measures of characteristics, such as height, and weight, 1 2 Galton, Hereditary Company, publishers.

Genius,

p. 23, by permission of T h e

Macmillan

32

Antecedents of Measurement

among a very large number of individuals, he came upon the problem of expressing the relation between any two such characters. The magnitude of the difficulty is seen if one ponders the problem of showing the relation between height and weight in a large number of individuals. It is readily recognized, for example, that as a rule the average weight of a number of men six feet tall is more than that of an equal number who measure only five feet and nine inches. But it does not follow that all men who measure six feet are heavier than those who are only five feet and nine inches. Some of the shorter men will be found to be heavier than the taller ones, and some of the taller men heavier than the shorter ones. If they were directly and consistently related, any man taller than another would always be heavier. Such is not the case, however, and it was the absence of this direct and consistent relation that created the problem of discovering and exhibiting the relationships in data of large numbers. It is our good fortune that Galton hit upon the idea of freezing his data into the pattern which every student of elementary statistics knows today as a scatter diagram. The scheme is simply that of a system of coordinates, worked out many years before in the field of geometry, in which the abscissas and the ordinates were the two properties whose relationship was sought. Galton, lacking the mathematical skill necessary to the refinement of his procedure, depended upon others for the more exact mathematical treatment. Somewhat later Karl Pearson ( 1 8 5 7 - 1 9 3 6 ) , one of Galton's friends and colleagues, explored the problem still further and, using the method of least squares which Legendre and Gauss has formulated almost a century before, worked out the correlation formula which bears his name. This was the third step in the reduction of the properties of living things to the rigor and exactitude of mathematics. First there had been the work of Fechner, which had reduced sensations to mathematical statement. Then followed the two

Antecedents of Measurement

33

great mathematical applications of Galton, the first, to the description of variations of forms, and the second, to the relationship between any two variable qualities among a large number of cases. These three steps in the subjection of the qualities of life and mental processes to mathematical treatment, and especially the last two, form the groundwork of the application of mathematics to the phenomena with which we deal in educational study. In the last analysis, the mathematical treatment of reliability, validity, correlation, and prediction, as we have come to know them in educational measurement, go back to Galton and finally to the sweeping speculations and generalizations of Darwin. THE

RISE OF

PRECISION

PSYCHOLOGY

We have seen how physics expanded into psychology, producing a hybrid—psychophysics, how the Darwinian theory of evolution, based upon variation and natural selection, gave rise to a quantitative and mechanical approach to the study of life and mind; and how Galton appropriated the statistical method of Quetelet and extended its powers and its application. W e now come to the last significant factor in the intellectual background out of which emerged educational measurement. This factor is the rise of precision psychology. Precision psychology grew out of two sets of data : first, the problem set by variations in astronomical observations ; and second, the quantitative study of sensation originated by Weber and Fechner. The problem arising out of astronomical observations gave rise in psychology to the study of reaction time, which, in the last half of the nineteenth century, was one of the chief problems of psychology. T h e work of Weber and Fechner led to the significant study of sensations of touch, taste, vision, and audition. The study of these two fields of investigation came to a common focus in the person of Wilhelm Wundt ( 1 8 3 2 - 1 9 2 0 ) , who, at Leipzig in 1879, established the first psychological laboratory in the world.

34

Antecedents of Measurement

Most of the studies carried on in this laboratory by Wundt and his students, for the first twenty years of its existence, were concerned in one way or another with the problems of reaction time and sensations. Before going on, we should consider the background of these two fields of study. Bessel, astronomer at Königsberg, it will be recalled, had made extensive investigations into individual differences of observations. In 1822 he published a report of his comparative tests, which established persistent individual differences in recording times of stellar transits. Bessel's work attracted wide attention and many astronomers throughout the next fifty years made studies of the error of observations. In time, these studies led to the conclusion that the personal equation varies with the nature of the stimulus. That is, the magnitude of a star, its direction, and rate of motion, and so forth, are factors that affect the observations. These conclusions gradually led to an abandonment of the problem by astronomers, because it was evident that it was a psychological problem involving the quantitative factor of brightness of the visual sensation, and so on. So in the 1860's the study of the personal variations of observations became definitely a psychological problem. Fechner, as we have noted earlier in this chapter, in the decade from 1850 to i860 had put the study of sensations upon an objective and quantitative basis, comparable to that of the study of physical phenomena, although in so doing he had no intention of establishing an experimental psychology. In the first place, he was a physicist, and, in the second place, he was interested in sensations because of his own metaphysical speculations. It was also true that the astronomers who worked on the problem of the personal equation were not interested in psychology per se, and, as we saw, finally gave up the study of the problem because it led them into the field of psychology. Thus it was that the amassing of psychological data, theories, and methods continued, with no one directly

Antecedents of Measurement

35

interested in the field and no one available to weld the works together into an intellectual discipline on its own account. Psychological study, the stepchild of medicine, physiology, philosophy, and physics, was thus the privilege of all and the responsibility of none. This was the condition in which Wundt found the study of psychology when he came definitely into the field and established his laboratory at Leipzig in 1879. Wundt gave his life to the founding of psychology on a sound scientific basis, making it a respectable intellectual discipline. The spirit that permeated Wundt's laboratory was that of the exact scientist. Time, distance, and frequencies were the primary quantities which lay at the basis of the observations and experiments made there. Conditions were rigorously controlled, and exact quantitative descriptions of the experimental results were made. From this laboratory, in the first twenty years after its establishment, issued almost a hundred experimental studies, and the spirit that dominated each study was that of rigorous intellectual analysis, persistent control of conditions, and quantification of results. Most of the experiments were on reaction time and sensation and perception. Wundt attracted students from all over the world, and a number of American students came under his tutelage. Among these were Judd, Hall, Angeli, and Cattell. It is the latter of these with whom we are most concerned, for despite the fact that Cattell's research work was never extensive and that his writings were confined to reports of his own studies, he has done more to shape the course of American psychology than perhaps any other single individual, and, through Thorndike, to mold the form of educational measurement. His influence was made possible through a long line of brilliant students, including such men as Thorndike, Woodworth, Franz, Dearborn, and Hollingworth. In the 1880's Cattell was a student of Wundt, and for three years he was Wundt's assistant. In 1888 he was lecturer at



Antecedents of

Measurement

Cambridge University, where he came into contact with Sir Francis Galton and helped him to set up his Anthropometric Laboratory at South Kensington. Cattell was thus a product of the Wundtian scientific psychology and the Galtonian statistical school, which advocated the handling of massed data by distribution and statistical measures. Contrary to the Wundtian tradition, he held firmly to his preferences for a study of individual differences, which he found partly satisfied in the statistical methods of Galton, and for a purely objective psychology. It was largely through Cattell that the mathematical and statistical approach of the anthropometric school and the psychological approach of the Wundtian scientific psychology reached American psychological study and gave it its quantitative flavor, its passion for mathematics, and its experimental techniques. In 1891 Cattell opened a psychological laboratory at Columbia University, from which for twenty-five years issued experiment after experiment on reaction time, free association, method of order of merit, time of mental processes, accuracy of perception and movement, memory, perception of space, color vision, preferences, judgments, individual differences, fatigue and practice, behavior of animals and children, and reflective thinking. 1 3 O f these studies Cattell could say as early as 1904 "that most of the research work that has been done by me or in my laboratory is nearly as independent of introspection as work in physics or zoology." 1 4 Cattell was perhaps our first objective psychologist. His work on small noticeable differences, done in 1892, was in part a protest against the introspective method of establishing just-noticeable differences, which had been used since the time of Fechner. H e chose the simple reflex theory of reaction, in preference to the more subjective apperception theory 1 3 See Archives of Psychology, No. 30, April, 1914, for a review of the fundamental works of Cattell by his former students. 1 4 Cattell, " T h e Conceptions and Methods of P s y c h o l o g y , " p. 30.

Antecedents of Measurement

37

of Wundt. H e held firmly to a quantitative description of reaction, rather than to a qualitative account. He held to the belief that the ultimate value of reaction-time measurements was to establish psychological constants, and not to arrive at results through introspection. A n d his treatment of data was almost invariably graphic and statistical. Cattell's belief in individual differences as a fundamental problem of psychology, which he reflected in his doctor's work under Wundt, ultimately led him to a study of mental measurement. In 1890 appeared a paper on mental tests and measurements, 15 and in 1896 he published a work, with Farrand, on the measurement of physical and mental traits of students of Columbia University. 1 6 It was furthermore reported that he encouraged Thorndike to extend to the study of the intelligence of human beings the method which Thorndike had already used in his study of the intelligence of animals. Cattell, however, held to the belief that the higher mental functions could be measured indirectly by measuring the more simple functions, such as reaction time, association time, accuracy of perception, and so o n — a reflection, no doubt, of the Wundtian influence. T h e development of mental measurement did not follow this theory, but rather developed along the lines advanced by Binet ( 1 8 5 7 - 1 9 1 1 ) and his followers, who worked on the theory that the more complex mental processes could be measured by direct approach. Nevertheless, it was Cattell who established in America the statistical approach to the treatment of behavior, without which the more direct measurement of the higher processes could not have been so easily consummated. 17 Cattell, "Mental Tests and Measurements." Cattell and Farrand, "Physical and Mental Measurements of the Students of Columbia University." 1 7 W a l k e r , Studies in the History of Statistical Method, pp. 156-57. 15

16

38

Antecedents of Measurement SUMMARY

W e have seen how precision psychology arose, and that its spirit and method were predominately quantitative and experimental. W e have also observed that through Cattell American psychological thought was influenced by the anthropometric school of Galton and the scientific school of Wundt, and that Cattell, through Thorndike, passed on the spirit and methods of these schools to educational study. Let us now sum up the whole movement, in so f a r as it influenced the rise of educational measurement. W h a t do we have as a residue, a f t e r all the quantitative and experimental studies that issued from the psychological laboratories in the last quarter of the nineteenth century are analyzed? It is fairly obvious that the chief contribution of precision psychology was not the multiplicity of experiments; it was not the laying of a methodological foundation, for in a fundamental sense it added little to the work of Fechner, Galton, and the early students of reaction time; and it was not the building up of a mathematical structure essential to the treatment of its data. T h e residue was none of these. It was the raising of psychological study to the level of an organized and systematic science; it was the binding together of the loose ends of a number of disciplines into a new field of intellectual exploration; it was the focusing of the interests and activities of a large number of capable men upon the problems of a neglected field; it was the solidifying and crystallizing of a number of methodological patterns that had been established on the fringe of other disciplines ; and most of all it was the awakening of a passionate interest in the study of mental processes by quantitative means. It was thus the spirit and the faith animating precision psychology that radiated to educational study and that gave it courage to proceed along the lines laid down by Quetelet, Fechner, Darwin, and Galton. And it was through the rise of precision psychology that the methods

Antecedents of Measurement

39

and ambitions of these men influenced educational study at the beginning of our century. W e have thus followed the quantitative and mechanical conquest of organic and mental processes during the nineteenth century. In doing this it has become obvious that the century laid down the outlines which the quantitative conquest of mental life was to follow in our own time. The basic elements of that design are : first, that the realm of mind and consciousness is natural; second, that the events of this realm are due to the mechanical forces of nature, just as is true of other events; third, that these events are variables and obedient to law and order ; and fourth, that the structure of their order is the same as that of certain mathematical formulations, rendering them subject to measurement and mathematical treatment. In the closing years of the nineteenth century the intellectual design of educational measurement was sketched. Lacking the experimental techniques necessary to advance the implications of such design, the nineteenth century left to the technicians of our times the task of constructing the measuring instruments with which to erect an educational technology. In a very true sense, however, the advancement of educational measurement in our own century has been confined to filling in the details of the design boldly sketched by the illumined and generalizing minds of the last century.

III

Early Development and Classification of Instruments

B

y ι goo the intellectual antecedents of educational measurement were fully developed, and the first decade of the new century witnessed the birth of a scientific spirit in educational study and the rise of a passionate effort to design measuring instruments for an educational science. So successful was the effort that today there are literally hundreds of different instruments of measurement. Some of these instruments have been adapted from fields such as psychology and sociology. The great majority of them, however, are the result of research in the field of educational measurement itself. Indeed, there was perhaps never a field in which more effort in devising instruments of measurement was expended in the same length of time than in the field of education. The only possible exceptions are psychophysical measurement, following the work of Fechner, and the recent efforts to measure intelligence. It is the purpose of this chapter to recount some of the outstanding steps in the development of these instruments, to present a general classification of them, and to review some of the ways in which they have been criticized. THE

BEGINNINGS OF EDUCATIONAL

MEASUREMENT

The quantitative and mechanical techniques of studying life and mental processes came into the field of education largely through the work of Edward Lee Thorndike. In 1897 Thorndike, a student under Cattell and Boas at Columbia, was studying statistical method and "finding it new and very

Development of Instruments

41

hard for me to learn." 1 In 1899 he became connected with Teachers College, Columbia University, where for almost four decades he has rigidly and consistently advocated and practiced the quantitative study of educational problems. A s early as 1902, Thorndike offered a course in education which dealt specifically with the problem of measurement. The course was entitled "Education (or Psychology) 108—Practicium. T h e Application of Psychological and Statistical Methods to Education." According to the announcement of the course, it dealt with "means of measurement of physical, mental, and moral qualities, including the abilities involved in the school subjects and rates of progress in various functions; the treatment of averages; measurement of relationships ; etc." 2 Thorndike had been giving a course in child study since 1899. This course also contained considerable statistical work, but "Education 108—Practicium" was the first formal course ever given in what has come to be known as educational measurement. T w o years later, in 1904, appeared his treatise on mental and social measurement, which laid down the fundamentals of educational measurement. 3 Except for the novelty of the quantitative and mathematical approach to educational study, there was little in the treatise that was essentially new, for its basic content had already been developed in psychology, biometry, and anthropometry. Nevertheless, it was a pioneer work in education and established a rationale for an educational science. But even before Thorndike entered the field of education, and while he was yet a student, a movement was started to evaluate the effectiveness of instruction in terms of the products of pupils. This movement was begun by J. M. Rice 1 Ayres, "History and Present Status of Educational Measurement," National Society for the Study of Education, Seventeenth Yearbook, Part 2, Chapter I, p. 12, by permission of the Society. 2 Teachers College, Columbia University, Announcement, 1902-3, p. 54. 8 Thorndike, An Introduction to the Theory of Mental and Social Measurements.

42

D e v e l o p m e n t of

Instruments

(1857-1934), physician, editor, student of psychology and pedagogy, advocate of the "new education," and crusader for the scientific study of educational problems. During 1892 and 1893, Rice published in The Forum a series of articles describing school systems of thirty-six cities in the United States. These articles were the result of some eight years of pedagogical study in search of some means whereby children might be taught more efficiently. He had visited and studied the school systems of a hundred or more cities, both in the United States and abroad, and part of these years he had spent studying psychology and pedagogy at the Universities of Jena and Leipzig, where he had come into contact with the leading psychological and educational thinkers of the century. He was therefore imbued with the educational and psychological theories of his day, as well as with the practices. He also acquired, no doubt, at these universities, part of the scientific attitude toward educational problems that characterized his studies. The educational problem with which Rice was immediately concerned was that of enlarging the curriculum without sacrificing the tools of learning. His study and travel abroad gave him an unusually complete vision of the "new education" of his time. In Europe he saw schools in which formal methods were in part abandoned and in which the curriculum was enlarged. And these schools pleased him. His criticism of the school systems in American cities, as reported in The Forum, was based, therefore, "not so much on results as on the contrast in the classroom spirit that existed between the old-fashioned, mechanical schools, with their narrow curriculum, on the one hand, and the modernized schools, with their extended curriculum, on the other." 4 In the 1890's, as in our own day, the educational profession had its conservative and its progressive leaders. The 4 Rice, Scientific Management in Education, p. vi, by permission of Noble and Noble.

D e v e l o p m e n t of Instruments

43

former held the v i e w that an expansion of the curriculum would ultimately prove to be detrimental to the fundamentals, while the progressives held the theory, so o f t e n advanced nowadays, that the fundamentals were merely the tools of learning, and that, as such, they were of much less value than the outcomes, on which progressive education laid stress. This conflict between the conservatives and the progressives led Rice to seek a solution based upon facts, rather than upon speculation and opinion. S u m m i n g up the status of the educational thought of his time, he said, Everything is speculative; nothing is positive. " I think" and "I believe" are the stereotyped expressions of the educational world : " I know" has not yet been admitted. 5 Rice revolted against this speculative and conjectural manner of dealing with educational problems and thought that only by seeking the facts could such problems be solved satisfactorily. Until the truth is known concerning the possibility of broadening the curriculum without detriment to the three R's [he said], educational conflict will not abate, and the road to progress will continue to be barred. Therefore, the work which, above all others, ought now to engage our people . . . is to undertake measures that will lead to the positive discovery as to how much time is actually required to secure satisfactory results in reading, writing, and arithmetic.® In an effort to answer this question, he carried on the research that placed him a m o n g the first, if not the first, actually to introduce an objective study of educational problems. In 1896 Rice began a second series of articles in Forum.

The

T h e articles appeared at intervals extending over a

period of 8 years, the last one appearing in 1904. T h e y were 6 Rice, Scientific Management in Education, p. 22, by permission of Noble and Noble. β Ibid., ρ. 24, by permission of Noble and Noble.

44

Development of Instruments

reports of results of tests in spelling, arithmetic, and language, given to almost 100,000 school children in many different cities. The tests were designed and given to the children in an effort to answer two questions : first, what should be accomplished in these subjects? and, second, how much time should be taken to do it ? The tests were crude, compared to modern instruments, but they were identical with presentday tests in one essential respect. They evaluated learning in terms of what children could do. Rice saw clearly that it was necessary to exclude the assumed results of formal discipline in evaluating achievement in spelling, arithmetic, and language. On this point he was challenged when he appeared before the Department of Superintendence at its annual meeting in 1897. At this meeting Rice proposed for discussion the question of determining which group of children at the end of an 8-year period would be better spellers, a group which had studied spelling 10 minutes per day, or a group which had studied it 40 minutes per day. He was severely rebuked by the superintendents for proposing such a question, for they believed that the main purpose of study waá to discipline the mental powers and not to modify behavior. Consequently, on their theory the determination of the better spellers left out of account the most significant factor—mental discipline. But Rice held to the theory of evaluating the results of instruction in terms of the things which children could do, and the experience of subsequent years upheld his position. While Rice pioneered the movement for the scientific study of education in America, it is also true that his contribution was chiefly that of a crusader. The second series of his articles was not only a report of his research, but also an open plea for the scientific study of education and the scientific management of the schools. His techniques of research were crude, the validity of his tests was assumed, and their reliability was not established. Furthermore, his techniques of administering tests were not refined, and his treatment of

Development of Instruments

45

results, while analytical, was not seriously mathematical. Despite all these technical defects, Rice carried on his studies in a truly scientific spirit. A n d the spirit was the essential thing. Later workers criticized and improved upon his techniques, they applied mathematics where he had been content with simple arithmetic, but the power and sincerity of the spirit which animated his work no one could gainsay. W h e n the scientific study of education came into its own in the first and second decades of our century, the part of Rice's work that lived was the problems which he set and the faith in the quest for facts as the method of solving them. The procedures and mathematical foundations of the rising educational science came, as we have observed, from other sources, but the scientific spirit which radiated from the studies of Rice must have won many who otherwise would have been turned against the invasion of science. W e shall now turn to an account of the work of those who followed more rigorous methods. THE FIRST INSTRUMENTS OF EDUCATIONAL

MEASUREMENT

W e have already had occasion to note that Thorndike was one of the leaders in the movement to establish an educational science based upon exact measurement and calculation. T h e phase of this movement in which we are particularly interested is the development of instruments of measurement. The first instruments, with few exceptions, were products of Thorndike and his students. The first significant instrument of educational measurement was devised in 1908 by Stone 7 ( 1 8 7 4 ) , a student of Thorndike's. This instrument consisted of two tests in arithmetic. One of the tests was on reasoning in arithmetic, and the other one dealt with the fundamentals. In the construction of these tests, Stone emphasized systematic and uniform procedures of control and standard achievement. The tests were primarily for the sixth grade, and the items .

7

Stone, Arithmetical

Abilities

and Some

Factors

Determining

Them.

φ

Development of Instruments

which comprised them were weighted in such a way as to make allowance for differences in difficulty. T h e work of Stone stimulated Courtis ( 1 8 7 4 ) to venture the construction of another test. 8 The arithmetic tests of Courtis appeared in 1909 and were revised in 1911. Courtis' tests were standardized for each grade and they measured both rate and accuracy. His techniques were refinements of those of Stone, and his main contribution was an extension of the idea of standardization below and above the sixth grade. The tests of Courtis, as well as those of Stone, gained wide recognition, the former having been used in a survey Of the schools of New Y o r k City as early as 1911. A t the same time that Stone and Courtis were preparing their tests in arithmetic, Thorndike was working on a scale for the measurement of the quality of handwriting. This handwriting scale was described before the American Association for the Advancement of Science in 1909, and the following year it appeared in the Teachers College Record. The construction of the scale was based upon the theorem that differences noted equally often are equal. This theorem, as we have already observed, came from the formulation which Fullerton and Cattell gave to psychophysical measurement, originated by Weber and Fechner. T h e scale consisted of samples of handwriting which were arranged in order of merit and separated by equal units, extending upward from a zero point. Immediately after the appearance of Thorndike's handwriting scale, A y r e s ( 1 8 7 6 ) set to work to devise another scale based upon the theory that samples of handwriting could be arranged in terms of the criterion of legibility. 9 Samples of handwriting were read under controlled conditions, and from the average time taken to read a given sample and the Courtis, "Measurement of Growth in Efficiency in Arithmetic." Ayres, A Scale for Measuring the Quality of Handwriting of School Children. 8

9

Development of Instruments

47

number of words it contained, the rate, in words per minute at which reading had been done, was calculated. From the reading rate the samples were arranged and spaced at equal intervals on a scale. The essential difference between the procedures of A y r e s and Thorndike is that the former based his scale upon the average reading rate of a number of persons, while Thorndike depended upon the immediate judgments of persons as to the relative merit of samples of handwriting. They both used the normal curve as a basis of deriving equal units on their scales. In 1913 Buckingham ( 1 8 7 6 ) , another student of Thorndike's, published the results of his work on the construction of a spelling scale. 10 This scale was different, in one important respect, from the arithmetic scales of Stone and Courtis and the handwriting scales of A y r e s and Thorndike. Buckingham attempted to set up a scale of difficulty in spelling. That is to say, by using as a criterion of difficulty the percentage of pupils spelling a word correctly, he was able to rank words from very easy to very difficult. This procedure enabled him to construct an instrument which began with words which nearly all pupils could spell, and increased in difficulty, so that by the upper end of the scale very few pupils could spell the words correctly. This procedure was of no great service in measuring achievement in spelling, but it established a pattern of instrument construction applicable to other fields. Upon this procedure W o o d y 1 1 built his instrument for the measurement of achievement in arithmetic, and Trabue 1 2 followed it in designing his language scales. From these early beginnings educational measurement expanded rapidly so that within a few years it had equaled the fondest expectations of its advocates. Beginning with the measurement of achievement in the tool subjects, within a 10 11 12

Buckingham, Spelling Ability: Its Measurement and Distribution. Woody, Measurements of Some Achievements in Arithmetic. Trabue, Completion-test Language Scales.

48

Development of Instruments

few years the students of measurement perfected their techniques to such an extent that their methods were advanced into the more baffling and elusive areas of understanding and appreciation. Barely two decades after the appearance of the first instruments of educational measurement, a survey of secondary-school tests and scales included almost 1,400 standardized or near-standardized instruments in the various school subjects. 13 Indeed, the dictum of Thorndike that "whatever exists at all exists in some amount," 14 with the implication that it can be measured, had apparently taken deep root. W i t h this early development of instruments of measurement before us, we are now able to turn to a consideration of the types of instruments, without extending the discussion to later developments, for these early instruments set the pattern of those that have been contrived more recently. GENERAL

CLASSES

OF

INSTRUMENTS

The way in which things are classified depends upon the purpose of the classification. Instruments of measurement, for example, may be classified according to school subjects—spelling, history, mathematics, reading, and so o n — w h e n their practical use is being considered. In the present discourse, however, we are interested in the structure of the instruments, and we shall therefore adopt a classification that conforms best to this particular feature. Educational measurement, as was shown in the last chapter, developed along the logical lines laid down in psychophysics and in anthropometric and biometrie measurements. Instruments such as Thorndike's handwriting scale are as a rule patterned after psychophysical measurement, and are generally referred to as quality scales. Performance tests and scales Odell, Educational Tests for Use in High Schools, p. 6. Thorndike, "The Nature, Purposes and General Methods of Measurements of Educational Products," National Society for the Study of Education, Seventeenth Yearbook, Part 2, Chapter II, p. 16, by permission of the Society. 13

14

Development of Instruments

49

such as the Courtis Arithmetic T e s t s and the B u c k i n g h a m Spelling Scale, while not based upon the experimental procedures of anthropometric and biometrie measurements, nevertheless are grounded in the fundamental logical and mathematical conditions developed in anthropometry and biometry by Galton and his students. It should be noted, however, that many features of these instruments were gathered f r o m different sources. T h e techniques of control were taken over f r o m psychology, especially f r o m the w o r k s of Ebbinghaus, Cattell, and Binet. T h e completion-type questions were first devised by Ebbinghaus in his mental test, prepared f r o m his experiments on m e m o r y . 1 5 T h e true-false f o r m of item is suggested in the memory tests of Ebbinghaus, the mental tests of Cattell, and the psychological tests of Binet. Furthermore, as early as 1904, Spearman suggested the necessity of determining the reliability of measures of a property and discussed the mathematical considerations basic to the determination of the relation between successive measures of the same property. 1 6 S u g gestions f o r validating instruments came f r o m Galton, as the following quotation clearly indicates : One of the most important objects of measurement is to obtain a general knowledge of the capacities of a man by sinking shafts, as it were, at a few critical points. In order to ascertain the best points for the purpose, the sets of measures should be compared with an independent estimate of the man's powers. W e thus may learn which of the measures are the most instructive. 17 In the evaluation of qualities of handwriting, composition, and so on, wherein it is not convenient to hold that one specimen is right and another is w r o n g , as might be done in the case of solutions to arithmetic problems, but where it is possible to distinguish a m o n g the specimens in such a w a y as to assert that one is better than another, a quality scale is apMurphy, Historical Introduction to Modern Psychology, p. 347. Spearman, " T h e Proof and Measurement of Association between T w o Things." 1 7 Galton, " R e m a r k s , " on an article by Cattell, p. 380. 15

16

ζο

Development of

Instruments

plicable. Quality scales have not been used extensively in educational measurement and comparatively few of them have been developed. Hillegas' composition scale and Thorndike's handwriting scale are examples of this type of instrument. In these examples the psychophysical pattern of instrument construction is followed. However, the construction of quality scales does not rest necessarily upon psychophysical principles, as the A y r e s handwriting scale amply demonstrates. Nevertheless, the construction of quality scales is usually based upon the psychophysical pattern, and it appears that the construction of instruments on this pattern may be more and more prevalent in the future. It now appears that such qualities as pacifism, militarism, and the like, are reducible to quality scales. 18 Performance tests or scales form a large group of measuring instruments. They include almost all of the instruments for the measurement of skills, abilities, and understandings in the various subjects of instruction. There are two general classes of performance tests : first, instruments that determine the level of achievement at which persons can perform tasks ; and second, instruments that ascertain the rates at which tasks can be performed. T h e first of these classes of instruments is spoken of variously as achievement, developmental, scaled, or power tests, because they tend to measure growth or development by determining the level of achievement of individuals. They consist of a series of tasks, or items, selected from a given subject-matter field and arranged in order of difficulty, as determined by the percentage of individuals who respond correctly to each task or item. Buckingham's spelling scale, as we noted before, was the first instrument of this type to appear, and shortly after the publication of this test appeared the instruments of W o o d y and Trabue for the measurement of arithmetic and language abilities respectively. In recent years a modified form of this 18

Thurstone, "Attitudes Can Be Measured," pp. 529 ff.

Development of Instruments

51

type of instrument has come to be accepted, on the assumption that the validity of an instrument is indicated by the capacity of its items to discriminate among individuals. Developmental or scaled instruments are illustrated by the Holtz First-Year Algebra Scale, Thorndike-McCall Reading Scale, Henmon Latin Test, and Van Wagenen American History Scale. The second type of performance tests is referred to as a rate test. It seeks to determine how many tasks or items can be correctly responded to in a given period of time. In the construction of rate tests, the difficulty and homogeneity of the items are taken into account. As a rate test is designed to determine the rate at which some particular function is performed, it is necessary that the tasks be more or less homogeneous in respect to that function. F o r example, a measure of rate of performance in arithmetic is frequently based upon performance in the separate functions, such as addition, subtraction, multiplication, and division of whole numbers, or some other defined set of homogeneous functions. If the items or tasks are not homogeneous, the rate may still be determined, but of course there would be no point in ascertaining it unless the rate of performing mixed tasks is the specified purpose of the instrument. In determining rate of performance, it is usually desirable to have tasks of equal difficulty. These tasks are selected as equally difficult when the percentages of correct responses to them are equal. On the basis of this criterion, it is possible to select homogeneous tasks, the percentages of correct responses to which are equal, thereby bringing the factor of difficulty under control. The Courtis Standard Tests in arithmetic were the first rate tests. They consisted of tests in the fundamentals of arithmetic, the tasks being homogeneous and arranged in groups of equal difficulty. Following Courtis' work, many other tests for the measurement of rate of performance were devised, especially in the field of reading. However, with the develop-

52

Development of Instruments

ment of diagnostic instruments and the gradual expansion of power or developmental tests, rate tests have ceased to occupy as much attention as they did in the early years of educational measurement. Nevertheless, there are still many rate tests, and they serve a useful purpose, especially in the measurement of skillfulness in such abilities as reading, typing, and calculation. From the foregoing discussion it is clear that there are three general classes of instruments, devised to measure achievement. These are rate tests, developmental tests, and quality scales. In the present study, we shall be concerned primarily with developmental tests. For, in the first place, they are most widely used in the measurement of learning and in experimental research ; and, in the second place, they have been subjected to the most rigorous and prolonged experimentation. However, quality scales will be examined, especially in regard to claims that equal units have been obtained. C R I T I C I S M S OF

INSTRUMENTS

Instruments of educational measurement have been criticized from several standpoints. The students of measurement have been critical of the procedures and techniques of instrument construction. While the procedures and devices of measurement were being made more methodical and precise, however, other critics centered their opposition on the theory of education and of society which appeared to dominate the construction and use of instruments. The progressive school of educational thought opposed measurement, on the ground that it is based upon an outworn theory of learning. It claimed that measurement developed under the guidance of the idea that education is erudition, and that instruments of measurement consequently reflect the theory that learning is the acquisition of information. Progressive educational thought, having rejected the idea of learning as merely the acquisition of information, placed little reliance

Development of Instruments

53

on instruments designed to evaluate learning by such a criterion. It also called attention to the fact that there were many results of the educative process which had not been measured, and that they were in a true sense more significant than the simple results which apparently had been reduced to quantitative terms. Furthermore, it w a s asserted by the progressive school of thought that the underlying psychological theory of measurement is mechanistic. It claimed that there is a direct connection between the mechanistic theory of learning and the scientific methods of measuring the results. T h a t is, the psychological theory underlying measurement is that learning is a mechanical process of stamping in atoms of skills, abilities, understandings, attitudes, and the like; and consequently instruments are designed to measure isolated habits and skills and disconnected items of information, rather than the integrated results of the learning process. Measurement has also been criticized f r o m the standpoint of the creative artist. 1 9 R u g g has pointed out that to the artist the test of value is found in his o w n subjective estimation of his work. T h e product of the artist, that is, a poem, a painting, or a symphony, is an "objectification of self." T o him it is unique and personal, its o w n standard of w o r t h and excellence. In the language of R u g g , The goal of creative production must be a unique thing. It is a painting, a poem, a tone poem, an oration, that is an objective portrait of an inner personality. Hence the impossibility of confirmation or of refutation of the product of such a personality by another. By what standards shall its confirmation be measured? B y whom is it to be confirmed or refuted? The product is the artist's personal record of self. A t any given moment no two human beings in the world are even approximately alike. Thus it is inconceivable that, except by the remotest operation of chance alone, the peculiar fusion of feeling-import, meaning, and bodily 19

Rugg, Culture

and Education

in America,

pp. 375 ff.

54

Development of Instruments

understanding achieved by an artist could ever be achieved by another. And if it were so achieved, it would not be a "confirmation" of the original artist's "generalization." It would either be sheer imitation or a new original product of the second artist. If it were the latter, it must be measured against his personal vision, not against that of the first artist.20 Students of the social order have also criticized measurement, pointing out that it supports the status quo. They argue that measurement is in direct accord with the prevailing autocratic system of educational administration and supervision, which so proudly reflects the industrial and commercial system of organization and control. In defense of this criticism, they point out that measurement came into education under the stimulation of the efficiency movement that swept through industry in the closing decades of the last century. W i t h the idea of increasing the efficiency of the schools, instruments of measurment were designed and used as tools for classifying and regimenting pupils, determining progress, and carrying on studies to increase educational efficiency. A s Beard so aptly put it, "the spirit of science so vital to industry crept steadily through the whole structure of learning, spreading its passion for measurement, standards and precision." 2 1 Furthermore, it is claimed that efforts to reorganize the curriculum, and to make it more in harmony with the demands and needs of an interdependent machine age, are seriously handicapped by instruments of evaluation, much in the same fashion that college-entrance requirements have impeded the reconstruction of the high-school curriculum. In other words, it is held that there is a tendency for the objectives of instruction to be determined by the instruments used to evaluate them, thus cramping the spontaneity and originality of both teacher 20 Rugg, Culture and Education in America, pp. 376-77, by permission of Harcourt, Brace, publishers. 2 1 Beard, The Rise of American Civilization, II, 792, by permission of The Macmillan Company, publishers.

Development of Instruments

55

and pupil, and curbing the sensitiveness of the school to social change. These, then, are the criticisms that have been made of educational measurement : adherence to an outworn educational theory ; reliance upon a mechanistic psychology of learning ; insistence upon an external standard of evaluation in the arts ; support of an autocratic and industrial-like scheme of school administration; and a tendency to prevent adjustment of the curriculum to social change. The proponents of measurement have defended themselves against these criticisms largely by defining the work of the instrument constructor. The makers of instruments, recognizing a division of labor, seek to shift much of the criticism back to the critics themselves. "If the educational theorists or the teachers will state the changes they wish to make in their pupils so they are identifiable in reality," says Thorndike, "the science of educational measurement can find some means to measure those changes." 22 The implication of this statement is that whatever criticism is made of measurement, on the side of its educational theory, should not be leveled against the science of measurement, but should be attributed to the lack of clarity on the part of educational theorists and teachers. Recently, however, the students of measurement have given considerable attention to the criticisms which have been directed against measurement on the side of its basic educational and social theory. They have thus attempted to push their procedures and techniques into the areas which educational theorists and teachers of the progressive school have long held to be of primary importance. 23 A modicum of reflection will show that the foregoing controversies are centered largely upon the particular educational 22 Monroe (ed.), Conference on Examinations. Eastbourne, p. 29. 23 Wrightstone, Appraisal of Experimental High School Tyler, Constructing Achievement Tests.

England. Practices;

56

Development of Instruments

values which are to animate and to guide the construction and use of instruments of measurement. So much attention has been given to this aspect that the logical foundations of measurement have been overlooked as a point from which to direct an examination of educational measurement. It is from this standpoint that the analysis developed in this book is directed. SUMMARY

In this chapter we have discussed the early development and classification of instruments and pointed out some of the criticisms directed against them. It has been shown that two general classes of instruments of educational measurement have been developed: quality scales and performance tests or scales. We shall turn now from our consideration of the development of instruments and direct our attention in the next chapter to those general principles which any property must satisfy before we can answer the questions which measurement asks concerning it.

IV

The Logical Foundations of Measurement

D

the fact that instruments of educational measurement are diversified and incapable of reduction to a single type, in the final analysis they rest upon identical general principles. The characteristics that differentiate one instrument from another are for the most part fixed by the nature of the subject matter and the purpose of the instrument. While these characteristics are used to classify instruments, they are simply the result of different procedures of satisfying the same underlying principles. These principles are descriptive statements of the kind of structure which any measurable property must be capable of exhibiting. In measurement we are therefore seeking to discover whether or not properties are amenable to a special structure. ESPITE

MEASUREMENT AS A SEARCH FOR

STRUCTURE

Since measurement is a search for a special kind of structure, it is necessary that we make inquiry into the meaning and nature of the structure required. Fortunately, the mathematical operations in which the findings of measurement are utilized afford a clue to our problem. But before we advance further, let us turn aside for a moment to examine the meaning of the term structure. Illustrations may perhaps supply meaning to the word "structure." T o a motorist in a strange region a map is a necessity, if he is to travel intelligently. Now a map is obviously not the territory, for it is stripped of all the countless details of the beauty and glory of a countryside. It is simply a composition of signs and symbols, which give an account of

58

Logical Foundations of Measurement

certain relations of selected features. Some symbols denote cities and villages ; others denote roads, monuments, and parks ; while still others indicate rivers, lakes, and mountains. The facts important in a road map are locations, distances, and directions; and the test of the adequacy of the map is the degree to which its own relations are consonant with those of the region involved. W h e n adherence to a map results in directed travel over analogous territory, the structure of the map is similar to that of the region. If the motorist is misled and confused, the structure is not similar. In the same sense a blue print is similar to the house it represents, and the order of words in a dictionary is similar to that of the letters of the alphabet. T h e notion of similarity of structure may also be applied to a series. T h u s two series are said to be similar when their terms are perfectly correlated. If on our map a series of symbols indicating cities A', B', C , D', n'lie on a line, and if the order of such symbols is exactly the same as cities A, B, C, D, η of the territory to which the map corresponds, the relation is a one-one correspondence and the structures of the two series are similar. T h e relation involved may be "to the left o f , " "to the right o f , " or "to the north o f , " and the like. As will be noted later, the structure of a series plays an important rôle in certain stages of measurement. In general, it may be said that any two things have similar structures when the members of one are shown to have a oneone correspondence to the members of the other, and when every statement which is valid for a set of members of one thing is true for the corresponding set in the other thing. T o return to the main question, if inquiry is made into the object of measurement, we shall find a direct connection between measurement and the aim of science. It is the ambition of science to establish a unified body of principles by which

Logical Foundations of Measurement

59

the uncertain, abrupt, and obscure may be reduced to control, unity, and clarity. Since mathematics is the medium by which science establishes uniformities among the data derived from measurement, the logical conditions, that is, the axioms, of the mathematical operations become the connecting link between measurement and the objectives of science. Measurement is a method by which mathematics is shown to be relevant to parts or aspects of nature ; and the structure which it seeks is similar to that implied in certain mathematical operations. 1 The readings of instruments are thus capable of numerical expression, and mathematics is enabled so to deal with the properties, traits, and qualities of things as to establish the uniformities which science seeks. If we select one set of conditions, that is, underlying principles, as necessary in the construction of instruments of measurement, to the exclusion of other sets of conditions, it is because the set chosen most precisely fulfills the requirements of algebraic equations. In a word, the conditions of measurement depend upon the use to which we put its results. Hence no set of conditions can be set forth as the only possible foundation of measurement. If we are to employ the calculus of science, however, in the treatment of the results of measurement, then its axioms fix as exactly as possible the conditions under which qualities can be measured. Thus, if mathematics is to be used successfully in the treatment of qualities dealt with in educational measurement—and it is constantly being used—the axiomatic conditions of the mathematical operations must be expressive of those properties. These formal conditions, when translated into proper experimental operations, fulfill the requirements of measurement ; that is, the process of measurement is a physical interpretation of the axioms of the mathematics we employ. T h e degree, then, to which any 1 Nagel, On the Logic of Measurement of the Present, p. 148.

p. 18. Cf. Mead, The

Philosophy

6o

Logical Foundations of Measurement

property can be measured will depend upon the discovery of experimental operations by which these axiomatic conditions can be satisfied with respect to that property. The use of numbers in educational measurement requires careful examination, if we are not to be led into error and absurdity. The assumption that we can do and say the same things about the properties as we can do and say about the numbers representing them, will lead us into all kinds of nonsense. It is an assumption too often left unquestioned that because such numbers satisfy the axioms of algebraic operations, the same thing is true of the qualities they represent. If the findings of educational measurement are to mean anything of significance to educational study as a technological discipline, properties of educational phenomena must be empirically shown to be as they are predicated in the axiomatic conditions of the mathematical operations. It should not be necessary at this late date to point out that mathematics is not dependent on any specific set of symbols,2 but the failure to recognize the implications of this point in educational measurement makes it necessary that attention be directed to it. The essence of mathematics is relations, not numbers and symbols. Such relations might be verbally expressed, but the use of numbers and symbols frees manipulations of the cumbersomeness and inefficiency of verbal discourse. While numerical evaluation is only one among other ways of expressing relations, it is the only one that offers escape from the engrossing and emotional details of verbal expression. When it is recognized that relationships are the essence of mathematics, it will be seen that if the values in an operation are not empirically vouched for, the results of applied mathematics may be fruitless. Were it not for the regularities and consistencies of nature, numbers would be of little value in the exploration and com2

It is worthy of note, however, that the adoption of certain sets of symbols, over other sets, has enabled mathematics to develop more rapidly.

L o g i c a l Foundations of Measurement

61

prehension of our environment. But just because it has been demonstrated that there is a similarity between numerical processes and certain observed properties, we should not fall into the mistake of assuming that all of nature has a structure consonant to that implied in the calculus of science, or into the even more serious folly of substituting mathematical manipulations for the act of observation. The only way in which a particular structure of any character of nature can be ascertained is by careful observation and experimentation. No amount of mathematical subtlety or statistical manipulation will render unreliable data trustworthy or establish a structure in the empirical world. W e are therefore obliged to set forth the axiomatic conditions of the mathematical operations which we use in the manipulation of the findings of educational measurement ; for, if we know these conditions, then we have the specifications to which a property must conform, if it is to be measured. THE

AXIOMATIC

CONDITIONS

OF

MEASUREMENT

There are two conditions which the axioms of the calculus of science predicates for measurement : the first is order and the second is addition. 3 A t this point only a brief and somewhat formal discussion of these will be given, discussion of the efforts to satisfy them experimentally having been reserved for a later section. Order is based upon the relations of things and may be observed and classified when one has familiarized oneself with its forms, for example, symmetrical, asymmetrical, or nonsymmetrical, and it may also be transitive, intransitive, or non-transitive. Furthermore, if it were appropriate, it could be shown that since symmetry and transitivity are independent, nine different types of order could be formulated. 4 IllustraNagel, On the Logic of Measurement, p. 18. Campbell, Measurement and Calculation, p. 4. See also Russell, Principles of Mathematics. 3

4

Ó2

L o g i c a l Foundations of

Measurement

tions will clarify the meaning of both order and relations. A symmetrical order is an arrangement of things such that when a relation holds between A and Β it holds also between Β and A . Thus if A is similar to Β, Β is similar to A . If A is neighbor to B, in the sense of being next door, Β is also neighbor to A . That is to say, if a relation is such that it is the same as its converse, it is symmetrical, and we speak of the arrangement of things having such relation as a symmetrical order. Some relations are said to be non-symmetrical when they hold between A and B, but not necessarily between Β and A . Thus if A is angry at Β, Β may not be angry at A . A relation such that if A bears it to Β, Β never bears it to A , is said to be asymmetrical. That is, for example, if A is greater, lighter, or taller than B, then Β cannot bear any of these relations to A . Things so arranged constitute an asymmetrical order and this order is one of the primary cpnditions of a series. In such a relation the reversal of order is impossible. It would be repudiating the structural facts to assert, for example, that Β is greater than A , when the converse is known to be true. A n asymmetrical relation is one that is always incompatible with its converse. If three or more terms are concerned, relations may be transitive or intransitive. W h e n a relation is such that A bears it to Β, Β bears it to C, but A never bears it to C, the order of things so arranged is intransitive. Thus if A is the father of B, and Β is the father of C, A cannot be the father of C. Similarly, it is true of such relations as "one foot to the left o f " or "one hour earlier than," and the like. A transitive order is such that when a relation holds between A and Β and likewise between Β and C, it also holds between A and C. For example, if A is greater than B, and Β is greater than C, then A is greater than C. If a relation is sometimes transitive and sometimes intransitive, such as "in love with," it is said to be non-transitive. The transitive and asymmetrical relations are very impor-

L o g i c a l Foundations of Measurement

63

tant in the science of measurement. For our purpose, a combination of these two relations, that is, a transitive asymmetrical relation, is of special significance, for it establishes a series. T h e first condition of measurement, therefore, is that any property to be measured must be capable of satisfying a series, that is, a transitive asymmetrical order. Every measuring instrument, if it has any validity, has behind its calibration a transitive asymmetrical arrangement of things in respect to the quality which is being studied. The transitive asymmetrical relation must also be consistently true of any property, if a measure of it is to be taken as unqualifiedly reliable; coefficients of reliability are measures of this consistency, as well as of the inherent defects of the instrument. The transitive symmetrical relation is no less important. It is this relation that establishes equality, 5 and without equality measurement is impossible. For in the practical application of instruments, the aim is to establish the relation of equality or of difference. T h e transitive symmetrical relation means that A equals B, provided a symmetrical relation holds between A and B, such, for example, that A is not greater than Β and Β is not greater than A , while at the same time any transitive relation which holds for A holds also for B — f o r example, if A is greater than C, Β is greater than C. Stated somewhat formally, the minimum conditions which any property must satisfy if it is to be measured may be summed up as follows : 1. The property must be such that of things having it in common, it can be shown that either A > B, or A < B, or A = B. 2. The property must be also capable of an order such that if A > B, and Β > C, then A > C, that is a transitive asymmetrical order. 3. Furthermore, the property must be amenable to the following conditions of equality : A = Β ; provided, first, A B, 5

Campbell, Measurement

and Calculation,

p. 5.

64

L o g i c a l Foundations of

Measurement

A < Β ; and, second, if A > C, Β > C, or if A < C, Β < C, f o r any C whatever. I n a limited sense, measurement is possible when an instrument of measurement is grounded in a structure coincident with the axioms above stated. Such measurement, however, can give no solution to problems involving the question of quantity, that is, questions of "how much" or "how many times." Before these questions can be answered, there are additional relations which require empirical validation. These relations are implied in the operation of addition. Measurement in the complete sense therefore requires that the axioms of addition, as well as those of order and equality, be satisfactorily fulfilled. T h e minimum requirements of addition are that the quality concerned be shown by experimentation to be amenable to the following axioms. 6 ι . If A = B, then A + C > B f o r C > 0 7 2. If A + Β = C, then Β + A = C 3. If A = Β and D = C, then A + D = Β + C (commutative principle) 4. ( A + B ) + C = A + ( B + C ) (associative principle) T h e symbols -f- and = in the physical operation of addition apply when the sum of the values is different f r o m the separate values only in quantity. T h a t is, the values in the operation have the same characteristics in summation as they possess separately, and the algebraic total must contain only those effects that were present prior to the process. Concurrent changes within the same phenomenon tend to obscure these essential requirements of the process of addi6 Campbell, Measurement and Calculation, p. 15. Cf. Nagel, On the Logic of Measurement, p. 18. 7 In dealing with negative magnitudes, Campbell has suggested "that if a system B ' exists such that ( A + B ) + B ' = A , then to B ' is to be assigned the numeral - B . " H e suggests also that "we should define a system Β as having a magnitude O if A + Β = A ' " when A = A ' (Measurement and Calculation, pp. 25, 26, by permission of Longmans, Green, publishers.)

Logical Foundations of Measurement



tion. For example, hydrogen and oxygen under proper conditions combine to form water. In the process of combining, certain properties are amenable to quantitative treatment, while others cannot be so manipulated. In the chemical action represented by the equation 2 H 2 + 0 2 = 2 H 2 0 , the properties of weight and volume can be measured completely, that is, they can be made to satisfy the conditions of both order and addition. The first of these properties manifests the nonqualitative character of a quantitative change. But there are properties, such as density, temperature, and viscosity, for example, which emerge in the process, that cannot be added. They are subject to the conditions of order, but there is no known way by which the axioms of addition can be experimentally interpreted in terms of them. Again, the axioms of addition predicate that the sum of the properties is dependent only on the amount of the properties and not on the method or order of the additive process. This is implied in the commutative and associative axioms of addition. In the case of length, for example, it makes no difference whether length A is added to length B, or vice versa, for within the limits of the errors of measurement the results will always be the same. It does not follow, however, that all properties are commutative. If the attempt is made to add illuminations of different colors, for example, it will be found that it is not possible to do so, because if two illuminations, A and B, from red sources, are equal respectively to two illuminations, A ' and B', from blue sources, it does not follow that the sum of the illuminations projected from the blue sources is equal to the sum of the illuminations generated by the red sources ; that is, the commutative principle of addition does not hold good experimentally in this instance. 8 Similarly, the axiom of association is not descriptive of all phenomena. Thus the effect of simultaneous blows with hammers on a surface is not the same as the effect when the blows 8

Campbell, Measurement and Calculation, p. 44.

66

L o g i c a l Foundations of

Measurement

are administered separately. Nor is the behavior of a person in the company of two or more persons the sum of his behaviors manifested in the presence of the same persons separately. In general, it may be said that properties which appear as though they should be subject to the principles of commutation and association, seldom manifest such structure experimentally. T h i s is generally true of things which interact, and is the despair of those who would reduce all qualities to quantitative terms. It should be remembered, however, that measurement is experimental and an a priori assertion that a property can or cannot be measured in a complete sense can be made only at the risk of intellectual embarrassment. Addition is an experimental concept, as Nagel, referring to a similar conclusion advanced by Helmholtz long ago, points out, and it is capable of more than one experimental interpretation. "Undoubtedly," says Nagel, "spatial juxtaposition is the primitive meaning in the addition of lengths ; but in the example used [monochromatic illumination] addition involves conjoint activity of sources; in measurement of weights, it means the establishment of rigid connections between solids, in the estimation of time periods, it requires the temporal repetition of certain rhythms ; in the evaluation of volumes it signifies the discovery of liquids which fll containers without implying spatial contiguity." 9 T h e fact that addition is not dependent on a particular set of experimental procedures and techniques is of much significance to the science of measurement, for it permits a wider range and a greater variety of experiments in the establishment of physical correlatives for the axioms of addition than would otherwise be possible. Consequently, a wider range of properties can be brought within the reach of quantitative evaluation. Furthermore, as Nagel points out, the complete measurement of such properties as illumination helps 9

Nagel, On the Logic of Measurement, p. 24.

L o g i c a l Foundations of Measurement

67

to remove "the belief that addition is exclusively spatial juxtaposition and division." 1 0 The kind of structure, that is, relations, which any property must exhibit in order to fulfill the conditions of measurement should now be obvious. But in order that the connection between this structure and measurement be clearly understood, we shall now give illustrations of some of the experimental procedures involved in educational measurement. INTERPRETATIONS

OF

THE

CONDITIONS

OF

MEASUREMENT

It must be observed at the outset that a transitive asymmetrical order of objects does not necessarily constitute measurement. F o r example, houses on a street are ordered in a transitive asymmetrical fashion and the relation is "to the left of" or "to the right o f , " but no one would argue that the houses were thereby measured, or that the elementary conditions of measurement had been satisfied, although a high degree of reliability might be shown for successive determinations of the order of the houses. It could, of course, be held that the houses are measured in terms of distance from a selected point, just as it might be argued that they are ordered in terms of weight. But if this were true, the relation would be more than "to the left o f , " or "to the right o f , " as the case may be. The point here is that we can have a transitive asymmetrical order without involving measurable properties. In order, therefore, for the arrangement of objects to be considered of value in measurement, there must be some qualitative continuity running through the transitive asymmetrical order. It is hence necessary in measurement that some qualitative field be selected and the particular property to be measured be discriminated as carefully as possible. The property thus selected must then be shown experimentally to be amenable to a transitive asymmetrical order. W h e n this has been done, the first step in the process of measurement has been taken. 10

On the Logic

of Measurement,

p. 24.

68

L o g i c a l Foundations of

Measurement

A s an illustration of such property, we shall use the character of learning, which persons exhibit in their relations with one another, and, with far less care than is actually used in experimental work, attempt to portray some of the procedures by which a transitive asymmetrical order is established. W e shall not raise any question as to the nature of learning at this point, since our present purpose is merely to illustrate the experimental character of measurement. W e shall simply define learning in the usual way and say that it is the capacity to do tasks selected from a given subject-matter field. O n the basis of this definition of learning, the transitive asymmetrical relation may be established among individuals within any specific subject-matter field, by following certain experimental techniques. 11 T h e following is, in general, the pattern for establishing a transitive asymmetrical order in constructing developmental scales. Items or elements are selected from a particular subjectmatter field, and from these items are selected those which best discriminate among individuals in respect to achievement within that field. The discriminative capacity of the items may be determined by the way in which the percentage of correct responses to them is related to the achievement of the individuals, as ascertained by some independent measure of learning. This independent measure may be some instrument of measurement already designed for that particular field, or it may be all the items out of which the one to be tested was selected, or it may be the academic marks of pupils, or some 1 1 T h e selection of the appropriate technique requires careful consideration on the part of the instrument maker, since educational measurement is still in its youth and the determination of the best technique is still a problem of investigation. A treatment of these techniques lies outside of this discussion, but the reader may gain some understanding of their controversial points by referring to studies of them. Some of these studies are: ( i ) Smith, The Relation between Item Validity and Test Validity; ( 2 ) Vincent, A Study of Intelligence Test Elements; ( 3 ) Lee and Symonds, " N e w T y p e or Objective Tests : A Summary of Investigations" ; ( 4 ) Lindquist and Cook, "Experimental Techniques in T e s t Evaluation," I, 163 ff. ; ( 5 ) Horst, "Increasing the Efficiency of Selection Tests," pp. 254 ff.

L o g i c a l Foundations of Measurement

69

other criterion. Whatever criterion is ultimately selected, an item has discriminative capacity if the pupils who respond correctly to it are on the average higher in achievement, as determined by the independent measure, than those who respond incorrectly to it. In other words, if pupils who respond correctly to the item are on the average equal in achievement to those who respond incorrectly, the item has no discriminative capacity. N o w by selecting items that discriminate among pupils at all levels of achievement, it is possible to establish a series of items capable of generating a transitive asymmetrical order of pupils for whom the items were selected. That is to say, when the series is administered, it is found that some pupils are able to respond correctly to practically none of the items, some to very few, some to many, and some to almost all of them. It is to be noted that the items will ultimately constitute the instrument of measurement and that they have now been arranged in a transitive asymmetrical order by an experimental operation, that is, in terms of the performance of persons and the criterion of discriminative capacity, as operationally defined. If there be items A , B, C, D, E, n, for example, in which A discriminates among persons who are on the average higher in achievement than those discriminated among by B, and if Β discriminates among persons who are on the average higher in achievement than those discriminated among by C, then A will also discriminate among persons who are on the average higher in achievement than those discriminated among by C, and so on for the entire series of items ; and the discriminative capacity of the items will be limited to the population from which the individuals used in the experimental work were selected. If it be asked how stable is the order of individuals generated by the items, it must be answered that it is not constant. That is to say, if the same series of items, or comparable sets of items, be administered to the same individuals on two

7o

Logical Foundations of Measurement

successive occasions, individual A, for example, may be higher than Β on the first occasion and lower than Β on the second occasion. If the order were identical from one occasion to another, the series of items would be perfectly reliable measures. But in educational measurement perfect reliability has not been attained. On the other hand, the order of the items is more constant than the order of individuals, for it is the result of the average performance of large numbers of persons. In educational measurement we therefore have a set of operations that generate a more or less stable order of items composing a scale, but when these items are administered to individuals, the order of individuals fluctuates to some extent from one performance to another. We must consider the problem of equality. If repeated performances of two persons place them at the same point on an achievement scale, they are not necessarily equal in any sense other than that they stand at the same point. All that we know concerning the equivalence of the two persons is that the symmetrical relation seems to hold, that is, A If> B, and A C, Β > C, and if A < C, Β < C on successive performances. The risk that is taken when equality is assumed on the basis of a symmetrical relation, without a sufficient confirmation of transitivity, is clearly seen in the measurement of hardness. For example, in the case of Mohs's scale of hardness—in which the relation is scratches—it was discovered that two minerals, neither of which scratches the other, had different capacities of scratching some third mineral. The condition of symmetry was satisfied, but the transitive relation, which is also essential to equality, was not operationally established and hence the minerals were not equal. In educational measurement no attempt has been made to satisfy the relation of transi-

L o g i c a l Foundations of Measurement

71

tivity. A cursory analysis of test scores will show that even symmetry has not been established exactly, and the relation of transitivity is even more uncertain. W e cannot, therefore, be sure that learning outcomes represented by equivalent scores are equal, until there have been adequate demonstrations that these conditions of equality are satisfied. The experimental operations involved in establishing a transitive asymmetrical order in measurement are necessary for the purpose of guaranteeing that the order is not an arbitrary arrangement. There must be an actual demonstration that the quality to be measured has satisfied the axioms of order. It is this requirement that distinguishes the ordering of persons by scores on a test from the mere arranging of them, as in the example of houses on a street mentioned above. When the order has been empirically established, it is then permissible to introduce numbers as a factor in measurement. Suppose that the series of items is administered to persons, A , B, C, D, E , F , n, and that these persons, arranged in terms of scores, form a transitive asymmetrical order. So arranged, A is higher in the gradient of tasks than Β, Β is higher than C, and A is also higher than C, and so on for D, E, F, n. T o this series of individuals we may assign, for example, any of the following sets of numbers : ( a ) 1, 2, 3, 4, 5, 6, n ; (b) 75, 73, 66, 60, 59, 45, . n ; ( c ) 10.ι, 15.5, 16.1, 18.2, 23.9, 25.4 n ; ( d ) 90.1, 81.2, 78.3, 73-4, 65.5, 59.1 n. If, however, we specify that the numbers assigned be in the same order of decreasing magnitude as the individuals on the gradient of achievement, sets (a) and ( c ) will be eliminated. It will then be an arbitrary matter whether we select set (b) or set ( d ) to designate the individuals. In actual practice, however, the arbitrariness of choice between sets (b) and ( d ) is removed by assigning numerical values to the items. When this is done, the numbers representing the individuals are no longer determined arbi-

72

Logical Foundations of Measurement

trarily, but by the responses which the individuals make to the items. Nevertheless, there is an unavoidable element of arbitrariness in the numbers used to designate individuals, since the size of the numerical values assigned to the items is arbi trarily determined, and consequently the scores of individuals could have been made different by assigning larger or smaller values. It may appear that this somewhat naïve way of examining the introduction of numbers is an unnecessary and devious discussion of the obvious. But the use of numbers in educational measurement, as has already been noted, is the source of much confusion, and a clear recognition of the conditions under which numbers can be introduced should clear up much of the difficulty. If, after the axioms of order have been empirically satisfied and numbers have been introduced, we conclude that because X has a numerical score twice as large as Y he has two times as much achievement, or that if the difference between the numerical scores of X and Y is 10 and that between Y and Ζ is 10, the differences are equal, we court error and invite nonsense. Our experimental operations have merely established the relation of order, and, since the magnitude of numbers is also an asymmetric, transitive relation, we are able to employ them to indicate the relation of transitivity and asymmetry among the individuals, with respect to the property involved. Because individuals have been ordered in respect to learning within a given field is no indication that an individual higher in the scale than another possesses more unit quantities of the something called learning. It may be that he does, but such a conclusion cannot be deduced from the experimental operations by which order is established. Before such a proposition can be safely asserted, our experimental operations must not only generate order but also demonstrate that the property satisfies the axioms of addition. No physical operations have been developed in educational measurement to euarantee that the Qualities dealt with are

Logical Foundations of Measurement

73

endowed with an additive structure. Indeed, we have failed to recognize that addition is an experimental concept, and, as we shall see in a later chapter, this oversight has resulted in much confusion in both educational and psychological measurement. In the search for equal units, therefore, assumptions and mathematical operations have been substituted for experimental interpretations of the axioms of addition. The general procedure has been to seek for equal units, on the theory that equal segments of the base line of a normal probability curve mark off unit quantities of the quality, if the quality is normally distributed; or, in the case of instruments patterned after psychophysical measurement, to assume that differences noted equally often are equal, an assumption which in the last analysis rests also upon the normal law of error. The efforts to establish equal units will not be dealt with at this point, since they in no way illustrate the connection between the axioms of addition and measurement. In a later chapter, however, they will be discussed on their own account, with a view to determining the fallacies which underlie them. For our present purposes, we shall have to borrow an illustration from the field of physical measurement, and we shall choose the measurement of mass. In mass, the conditions of measurement may be satisfied by a set of physical operations performed by means of a balance. The balance must of course be constructed in a certain way, for example, the knife edges on which the pans rest must be parallel to the knife edge upon which the beam is balanced. The asymmetrical relation has been satisfied, if, when an object, A , is placed in one pan of the balance, the other pan, containing B, rises. Then, if the pan in which is placed the object, B, sinks when C is placed in the pan from which A has been removed, and if it also sinks when A is substituted for B, the relation is transitive as well as asymmetrical. B y extending this procedure, a transitive asymmetrical order of objects is established.

74

Logical Foundations of Measurement

The conditions of equality are also satisfied when the symmetrical and transitive relations are confirmed experimentally. The relation of symmetry is confirmed when two objects, A and A', just balance each other when placed in opposite pans, that is, A > A', A C ; or if A < C, A' < C. By these operations, mass is measured in an elementary sense. They give us no right, however, to speak of one object as being twice as heavy as another, or of the weight of two objects being equal to that of a third. W e can only assert that an object, for example, is heavier or lighter or equal to another. In short, we have no operational ground for asserting that the quality is additive, and so long as this condition remains we court error when we use language that implies equal units, or the "times as much" judgment. When these elementary conditions of measurement are out of the way, it is then in order to devise operations that will exhibit the additive character of mass. Let us find two objects, A and A', equally heavy, and place them in one pan of the balance. Then some third object, B, may be found that just exactly balances the beam when placed in the other pan. When this has been done, it can then be said that the object, B, is two times as heavy as A. By extending this procedure, a standard series of objects in respect to weight may be established, and the relations between the weights of the series determine the equality of the units. A reference to the axioms of addition will disclose that at this stage of measurement they have been experimentally confirmed. Numbers can now be introduced and used in a quantitative sense, that is, to designate "how much" or "how many times" one object is more or less than another, for we have now not only established the relation of order but also the relation of

L o g i c a l Foundations of Measurement

75

difference. For example, some standard body selected, at least temporarily, and another body equal to it are placed in one pan of the balance. If the number 1 is assigned to this standard object, then, the object which balances the beam when placed in the opposite pan will be assigned the number 2, since operationally it is twice as heavy as the standard body. A n extension of this procedure will assign numbers to represent the weights of objects in a whole series. These illustrations of the connection between measurement and the axioms of the calculus of science should serve to make us wary of claims that a quality has been measured, unless these axioms are confirmed by experimentation. It should also be evident from these illustrations that the use of numbers to represent qualities is misleading, unless the condition of their introduction is clearly known. The fact that numbers can be properly introduced into measurement only by confirming the conditions of order and addition causes some students to define measurement as the assigning of numbers to represent qualities. I f , by the assignment of numbers, they mean the carrying out of a physical interpretation of the conditions of order and addition, this definition is sufficient. T H E TYPES OF

MEASUREMENT

On the basis of the foregoing discussion, it is possible to distinguish two types of measurement. Qualities can be divided into two categories on the criterion of addition; just as there are two sets of axioms, one of order and the other of addition, so there are two kinds of measurable qualities, those which are capable of order only and those capable of both order and addition. Qualities which are capable of both order and addition are completely measurable and their measurement is spoken of as fundamental measurement. Such qualities as length, time, mass, volume, and electrical resistance are measurable fundamentally. There are indeed very few such qualities. In fact, among all of the qualities known, barely a score

j6

Logical Foundations of Measurement

of them have been measured in a complete sense. Those properties which are not subject to fundamental measurement may in some cases, however, be shown to conform to the conditions of order and equality. The properties that satisfy these conditions may be measured in an elementary sense, that is, in terms of "more or less," but the question of "how much" cannot be answered. The measurement of these qualities constitutes the second type of measurement. This type is illustrated by the measurement of temperature and density, for in these cases only the axioms of order and equality have been satisfied. In the case of density, for example, there is no known procedure by which two solids can be put together in such a way as to produce a third body having the sum of their separate densities. The measurement of qualities such as density and temperature is usually based upon the numerical ratio between two different magnitudes. For example, density is the ratio between mass and volume, each being different magnitudes and measurable independently of their career in the numerical law. This type of measurement is called derived. Measurement in education and psychology is a derived type. Thus, for example, we may conceive of achievement as a function of the number of tasks properly performed, when the time is held constant or neglected. In this sense, the measurement of achievement and intelligence approaches the form of a numerical law : I = N T where I is intelligence, Ν the number of tasks performed correctly, and Τ the time held constant or neglected. Achievement might also be represented as A = NT, where the items comprising Ν are of a different character than those in tests of intelligence. In a recent book by Brown 12 psychometrics is criticized on two counts : first, that there can be no measurement worthy of the name save that which is fundamental ; second, that a knowledge of a property, as a factor in some law, is essential to the measurement of it. Since this position is contrary to 12

Psychology

and the Social

Order.

L o g i c a l Foundations of M e a s u r e m e n t

77

the one here advanced, it seems desirable to examine it at some length. It seems best [says Brown], to reserve the title of measurement for those scientific processes whereby number is assigned to the qualities of nature, and where the arithmetic theorem of addition holds for the number involved. Such processes have been called by Campbell fundamental measurement, and they have this great advantage over the mere assignation of number. The scales on which they are based may be constructed with an absolute zero point, and the equality of the units is assured. Such measurements are those of space, time, weight, and temperature on the absolute scale. . . . Where fundamental measurement is not possible, simple assignation of number to the qualities of nature has advantages, but such assignation ought not to be honored with the title of measurement. For psychology such a service is performed by rating scales, Intelligence Quotients, and the common psychophysical methods.13 The classification of an absolute temperature scale, along with the measurement of length, time, and weight, is perhaps an oversight. One can hardly prevent oneself from wondering, however, if it is not due to the belief that an absolute zero point somehow guarantees fundamental measurement. Or perhaps the word "absolute" is itself the source of confusion, since it implies something perfect and complete, and thus tends to lead one to the conclusion that the conditions of addition have been satisfied. But the term absolute has no reference to the axioms of addition; rather it refers to a theoretical zero point, and we, of course, do not need a zero point in order to add a property. Whatever be the source of confusion, however, temperature, even on the absolute scale, is only a case of derived measurement, rather than a fundamental measurement. In his contention that where fundamental measurement is not possible, our only recourse is to the "mere assignation 13 Brown, Psychology and the Social Order, pp. 469-70, condensed, by permission of McGraw-Hill, publishers.

78

Logical Foundations of Measurement

of numbers," Brown overlooks the fact that numbers are not merely assigned in the case, for example, of the measurement of intelligence. They are assigned in accordance with certain principles that have been experimentally confirmed. That is, the conditions of order have been satisfied, within certain limits, by physical operations. As was seen in the preceding pages, this fact removes a certain degree of arbitrariness from the assigning of numbers, just as is done in the case of determining density by the method of flotation. It is therefore misleading to refer to the measurement of intelligence or psychophysical measurements as the mere assignation of numbers. It is, of course, highly desirable that we have fundamental measurement in psychology and education. But if we cannot attain it, we can still escape the arbitrary use of numbers in the measurement of qualities, provided we satisfy the conditions of order. A f t e r all, most of our physical properties can be made to satisfy only the conditions of a non-additive series. Let us turn now to the methodological argument that before we can have measurement, some law must be known concerning the property which we desire to measure. Brown argues that the principle of the lever, for example, preceded the measurement of mass, and that the relation between space and velocity was recognized before the determination of time by clocks. If all that he means is that the quality must be well defined before we can measure it, and this is what he seems to mean in some of his discussion, 14 there is nothing more to be said. W e must, of course, identify the property, and the more knowledge we have of its relations to other properties, the better we can identify it. A knowledge of the law of the lever, as such, however, is not essential to the process of measuring mass, any more than a knowledge of the theory of reading must precede the reading process. Measurement, as we have noted, consists of experimental operations by which cer14 Brown, "A Methodological Consideration of the Problem of Psychometrics."

L o g i c a l Foundations of Measurement

79

tain axiomatic conditions are confirmed with respect to some particular quality. A n d the knowledge of a law in which the particular quality is involved, while it might be of assistance and in some cases actually indispensable, cannot be reasonably held as always essential. Those cases in which such knowledge might be essential cannot be determined in advance of experimental work. Brown himself points out that the measurement of length was not dependent upon the discovery of some prior law. Another case in point is the measurement of temperature. In the early water thermometers, said to have been contrived by Galileo, atmospheric pressure was not taken into account. T h e relation between V and Ρ in the numerical law Τ = V P was not at first recognized, but "open-tube" thermometers were nevertheless constructed and used. T o return to the main discussion, the classification of qualities into two groups, those which can be ordered and added and those which can be ordered only, must ultimately rest upon experimental evidence. There is no way of guaranteeing that qualities which are limited to order now may not be shown to be additive tomorrow. But certain qualities, such as density, have been sufficiently subjected to experimentation as to establish beyond serious question a class of non-additive qualities. Educational measurement has advanced no further than the conditions of order, and, while the problem of providing instruments of fundamental measurement has provoked much study in educational circles, we are still a long way from a solution of the problem. It may be that we shall be forced to content ourselves with as rigorous confirmation of the conditions of order arid equality as possible, for it may prove that the structure of our subject matter will not conform to addition. SUMMARY

In this chapter an attempt has been made to show that exact measurement is possible because certain properties can be

8o

Logical Foundations of Measurement

made to conform to certain axioms of mathematics. Measuring instruments are an indication that the conformity has been empirically established ; and every time an instrument is used, the kind of structure which it affirms is put to test. The conditions of measurement must depend on the use to which we put the results. The results of measurement are utilized in certan mathematical operations. We therefore set forth the axioms of these operations as the conditions essential to the quantitative evaluation of any property. These conditions are order and addition. The assertion that a quality has been measured in an exact sense can be validated, therefore, only by an exhibition of verified experimental evidence that the axioms of order and equality at least have been confirmed; and, for complete or fundamental measurement, a satisfactory physical interpretation of the axioms of addition must also be shown. Practically no attempt has been made in this chapter to discuss educational measurement critically, it being thought best to delay such treatment until the meaning of measurement was fully developed. Accordingly, we shall now turn to a critical consideration of measurement, as manifested in achievement scales.

ν

Logical Aspects of Validity

I

T WAS pointed out in the last chapter that order in itself does not constitute measurement, for houses on a street or letters in the alphabet are ordered, but they are not thereby measured. A transitive asymmetrical order does not constitute measurement, unless it embraces some continuous quality. That is, measurement implies that the order is exhibited in terms of some quality. The procedure of establishing the presence of the quality is what is called the determination of validity. In the present chapter we shall examine the logic of the procedure by which we attempt to establish the validity of achievement tests. In this examination we shall seek to answer four questions : First, what is the logic of the procedure? Second, is the logic such that validity is unquestionably guaranteed? Third, if validity is not guaranteed, upon what conditions may it be sufficiently certain? Fourth, are these conditions satisfied in the validation of achievement tests ? PROCEDURE OF E S T A B L I S H I N G

VALIDITY

In undertaking an analysis of validity, it is well to call attention to the complexity of the phenomena with which educational measurement deals. In physical measurement the problem of ascertaining the validity of instruments is seldom as perplexing a matter as in educational measurement. For, in the first place, the properties treated in physics have been within the range of human interests and activities longer than have most of the properties with which educational measure-

82

Logical Aspects of Validity

ment deals. In the second place, performance of organisms depends more upon a multiplicity of interdependent variables, thus rendering it complex, unstable, and fleeting in character. Qualities in this field do not come to us sharp, concrete, and individualized, as they ordinarily do in the phenomena of physics. Human performance consists of a variety of interrelated qualities, so that the relations between different acts may be ambiguously denoted by a particular variable. This condition makes it exceedingly difficult to define an aspect or character of performance. This resistance of performance to qualitative analysis gives rise to the question so often asked : "What is it that is measured?" The primary problem of educational measurement is therefore that of defining the character of that behavior which we ordinarily call learning. It is generally agreed that we refer to some property when we say that a person has learned. Whatever it is, the meaning of it is definite enough for us to understand one another in ordinary conversation, so that when a teacher tells a parent that his child has learned firstyear algebra, for example, the parent has some idea of the mental status of his child. That is, he assumes that his child can do certain mathematical problems. In making such a statement to a parent the teacher may have little, if any, doubt of the correctness of his statement, especially if he accepts the work of the pupil as a true index of his learning. And as a rule, the teacher does accept what an individual can do as an index of what he has learned. This is the common-sense view of learning and of its evaluation. Now the measurement of achievement begins with this common-sense view and then, by experimental operations and statistical analysis, attempts to describe learning in a more rigorous and thoroughgoing manner. The instrument-maker begins with the ordinary evaluation of learning and uses it as a criterion against which to validate his instrument. That is, he attempts to construct an instrument that will distinguish

L o g i c a l Aspects of V a l i d i t y

83

among individuals on the same basis as does the commonsense procedure. H e aims, however, to construct the instrument so that it will make finer and more consistent distinctions than those which are ordinarily made. The problem of validating instruments, therefore, becomes one of finding some set of operations by which individuals can be ranked in the same sense in which they are already ranked by some independent measure of learning, such as, for example, the estimations of teachers. The operations by which instruments are validated are varied, and the problem of determining the best set of operations is still an open one. Nevertheless, there is more or less general agreement that the task of validating instruments can be successfully carried out by finding some satisfactory way of correlating the various test items with some objective and independent criterion, such as academic marks or the scores of the test as a whole. It is more or less generally accepted that validity, operationally defined, means the extent to which an item discriminates among individuals in respect to achievement as determined by an independent criterion. A s was pointed out in the preceding chapter, an item has discriminative capacity if the pupils who respond correctly to it are on the average higher in achievement, as determined by an independent measure, than those who respond incorrectly to it. To express the same idea in terms of physical measurement, if an instrument designed to measure height did not, when applied, give values that distinguish individuals of four years of age from those of eight, it would not be a dependable instrument, and one would be impelled to conclude that it did not measure height. A series of items can be established by selecting the ones which discriminate among individuals at different points on the gradient of learning. Each item of the series is thus assumed to measure the same quality. That is to say, the character of behavior which is ordinarily designated as learning is continuous throughout the series. Re-

84

Logical Aspects of Validity

duced to its essentials, the procedure of establishing the validity of an instrument consists of a back-and-forth movement f r o m experimental and statistical operations, on the one hand, to the independent criterion of learning on the other. In a w o r d , since it is assumed that the independent criterion embraces learning, then the items correlated with it are valid. T h e f o r e g o i n g analysis in itself is incomplete, f o r the reason that experimental operations do not come ready-made. T h e y must be formulated, and the formulation requires some g u i d i n g principle. It is relevant, therefore, to demand the fundamental principle upon which the experimental operations are based. T h i s principle must be some conception of l e a r n i n g — o t h e r w i s e it would be valueless as a guiding princ i p l e — a n d an exposure of it will therefore reveal the theoretical character of the procedure of validating instruments. T h e principle rests upon a quantitative theory of learning. Developmental instruments in almost all cases are based upon the notion that learning is quantitative, and that, when we speak of a proportion of a person's learning or of his total learning as an algebraic summation, the expressions are not figurative

but possible statements of fact. In short, if we

could determine all the elements of a subject-matter

field,

g i v e these elements their proper weighting, and discover the number of them to which an individual could respond correctly, then w e should have the summed total achievement of that individual in the particular field of subject matter. 1 In measuring the achievement of an individual, therefore, it is unnecessary to expose him to all elements of a subject-matter field or even to all elements which he has supposedly learned. T h i s is true because a g r o u p of elements will bear a proportional relation to the total learning of the individual.

If,

therefore, the test items have a proportional relation to his 1 Hawkes, Lindquist, and Mann (eds.), The Construction and Use of Achievement Examinations, pp. 23-50.

Logical Aspects of Validity

85

total learning, his position in a transitive asymmetrical order established by them will be the same as if they represented his total learning. If instruments be constructed so as to distinguish among individuals whose ratings are poor, fair, average, good, or excellent, for example, it is assumed that one's location on this gradient would represent some ratio of his total achievement, even though his total learning is unknown. This conception is at the bottom of the belief that an individual who makes a higher score than another person has a greater achievement. An item of a test, therefore, comes to have an importance, not because of its educational value, but because it is assumed to be an index of the individual's total learning. This is the basic principle which lies behind the operations by which instruments are constructed and validated.2 To sum up the procedure of validating instruments : it has been pointed out that educational measurement deals with very complex phenomena and that its primary problem is that of defining learning in terms of experimental operations. In attempting to solve this problem, the instrument-maker begins with the ordinary evaluation of learning as a criterion. From this point he validates his instrument by finding some satisfactory way of correlating its items with an objective and independent measure. Basic to the whole procedure, however, is the intellectual formulation of a theory of learning, in terms of which the experimental operations are contrived. We thus arrive at the conclusion that in the final analysis the validity of an instrument rests upon a theoretical foundation. This is true, in spite of the fact that recourse is had to an independent measure of learning in attempting to establish validity. 2

A s we are merely examining the logic of validity in this chapter, no effort will be made to criticize this principle as such. In the next chapter we shall discuss the meaning of an item, in terms of one's entire experience.

86

Logical Aspects of Validity T H E POLAR N A T U R E OF VALIDITY

Is the guiding principle described above the only one in terms of which experimental operations can be performed? If not, why should it be accepted, in preference to other principles, as a basis for the formulation of such operations ? This is a question worthy of study, and we shall now turn to a consideration of it. In the preceding discussion it is clearly evident that the problem of validity can be solved only by making assumptions in respect to the nature of learning. The only way in which the scientific worker can extend his powers of observation and discrimination into regions beyond his senses is by recourse to physical manipulations. It is the charity of nature as well as the good fortune of man that events which are not immediately accessible can be dealt with through certain physical operations, devised on the basis of intellectual formulations. Science is possible just because intellectual excursions can be made into hidden relations of qualities immediately apprehended. The fabrication of these adventures, however, is not susceptible of direct and immediate substantiation. As guides to physical operations they are unverifiable hypotheses. This is true of any assumption of the nature of learning. We are therefore confronted with the fact that the logic of validity is polar in character. That is, we set out with the principle that learning is quantitative and that responses to the items of a test bear a proportional relation to a person's total achievement. Then we devise instruments on the basis of this principle, and thereafter justify the principle by the consequences which issue from acting upon it. By recourse to the principle, we construct instruments that free our observations of error ; and then we turn about and justify the principle, because it is in agreement with observations made by the instruments. The fact that validating an instrument involves a polar ar-

L o g i c a l Aspects of V a l i d i t y

87

gument does not necessarily condemn it, as some people have been led to believe. A s a matter of fact, all scientific thinking is polar ; there seems to be no escape from polarity in a world in which we have no antecedent principles so self-evident that they cannot be questioned. Science does not begin with truth nor with refined principles and observations, for that is the goal toward which it works. On the contrary, it begins with ill-formed principles and inaccurate observations. The principles are used as guides to observations and then the observations are used to reconstruct the principles, which in turn become more adequate guides to further observation. T h e method of science is thus a kind of intellectual shuttle between principles and observations. 3 This is really the basis of the instrument-maker's contention that, while his instruments are not perfect and in many cases are simply short cuts to what may be concluded anyway, they nevertheless are justified because they begin the long march toward the goal of science. If the procedure of validating instruments is polar, is it not possible to deny its guiding principle and, by making other assumptions, to set up another procedure in opposition to the one we have? In other words, if the argument is polar, the selection of guiding principles would appear to be an arbitrary matter. W e would thus be able to shift from one guiding principle to another, depending upon the preferences of the test-constructor. T h e validity of achievement tests is therefore not guaranteed on some absolute, logical ground. Apparently there is a choice of ways by which an independent criterion of learning may be confirmed. It may well be asked, if we can shift from one assumption to another, how shall we know to which principle we should adhere? O r one may suggest, if the present guiding principle is surrendered, where shall we stop? The winds of doctrine 3 Cohen and N a g e l , An Introduction pp. 391 ff·

to Logic and the Scientific

Method.

88

Logical Aspects of Validity

do not always blow from the same direction, and what shall we say when they have shifted, bringing us new ideas and new assumptions ? Shall we hold arbitrarily to that which we already have, or shall we just as arbitrarily select something new and untried? Surely, knowledge that withstands the shifting of doctrine must be fixed more firmly. Complete skepticism may be avoided if we keep in mind that any guiding principle may depend for support upon connections beyond its own context. A particular principle would be of little value to science without transitive aspects. If we cannot directly and objectively establish proof of any guiding principle, at least its correctness may be explored in two indirect ways : first, by an examination of its interconnections with other principles whose values have already been accepted; and, second, by taking account of the direct consequences which flow from its application. We cannot offer the assumption that the world is round, for example, as proof that it is round, but inasmuch as such an assumption enables us to explain many other phenomena and to direct our actions in such a way that valuable consequences result, these explanations and consequences may be taken as proof that the assumption is essentially correct. This character of scientific proof may be repulsive to those persons who hold to the idea that science discloses antecedent and ultimate truth, but such is the logic of science.4 While we may therefore be unable to place the validity of our instruments beyond the possibility of question, it may be possible to make them sufficiently valid to serve the purposes of science. This can be done if we bear in mind two facts: first, that experimental operations must be such that the property will permit their application ; and, second, the conclusions and relationships which measurement enables us to establish must be satisfactory guides to action. 4

Dewey, The Quest for Certainty, pp. 127 ff.

L o g i c a l Aspects of V a l i d i t y VALIDITY

AND

ITS

FIRST

89

CONDITION

In considering the validity of instruments f r o m the first of these t w o standpoints, that is, that the properties be amenable to experimental procedure, it is well to bear in mind that no set of operations is merely physical activity, f o r its f o r m u lation and direction are conditioned by the aims and values accepted at a given time. Nevertheless, the realization of aims and values is conditioned by the kinds of operations to which the indicated property will submit. W e cannot desire just anything and attain it by some set of operations ; otherwise the magician's art would be converted into an established science. W h e r e a given property will permit different intellectual f o r mulations and physical manipulations, however, it becomes a matter of choice as to which formulation we shall ultimately accept. W e could, in validating instruments of

educational

measurement, set up plausible alternatives to the generally accepted principle previously set forth. A t the present time, we do not k n o w what limitations the properties dealt with in education place upon experimental manipulations. W e do not know but that alternative principles would be just as easy to apply as the one w e n o w utilize. It is well to remember that the mechanical and quantitative assumption upon which w e now validate developmental instruments w a s formulated almost a quarter of a century ago, when the systematic exploration of the educative process w a s just getting under w a y . Since that time a number of alternative theories of learning have arisen, some of which indicate new operational principles f o r the validation of instruments. W e shall at this time do no more than point to the recognized fact that principles, involving an integrated and emergent theory of learning, are arising out of recent developments in biology and psychology. T h e r e is no a priori reason to suppose that human behavior will not admit nf

90

Logical Aspects of V a l i d i t y

manipulations based upon these new formulations. This fact, together with the widespread dissatisfaction with our present instruments, indicates the need of a healthy skepticism toward the fundamental principle on which their validity is grounded. Physics, as someone remarked, is permitted to have at least one new conception of the atom each year, and physics is an old and well-established science. It seems that in a young science such as education we should be continually on our guard lest we overlook some newer and more adequate conception of learning than the one we now act upon. It sometimes appears as though the tenacity with which we hold on to guiding principles were evidence of lack of insight into the logic of our method and the nature of our subject matter. One of the things that educational measurement stands in need of is a sincere willingness to make excursions into its subject matter from other points on the intellectual compass. Particularly in a young science the practice of putting too much faith in one standpoint may lead to dogmatic objectivism. It must be said in fairness that some instrument-makers have accepted the educational implications of recent developments in biology and psychology, and that they are now attempting to work out tests based upon theories that appear to be more adequate descriptions of learning than the old ones. This movement is not widespread, however, and it has not become very articulate. VALIDITY A N D ITS SECOND

CONDITION

Let us now consider the idea that the validity of instruments is borne out by the results which issue from their utilization. This second condition of validity has been frequently overlooked. As a result, the tendency has been to defend the validity of instruments on the ground that they measure whatever they measure. This is of course a valid defense

L o g i c a l Aspects of V a l i d i t y

91

under certain circumstances. I f , for example, we ask a physicist what a meter stick measures, he will say length. If we then ask what length is, he will properly reply that it is what the meter stick measures, that is, it means certain operations.® It should be noted, however, that the meter stick, as a measure of something called length, has been justified by the consequences issuing from the use of the numerical measures. When the consequences have justified the validity of the measure, we can then define the measured quality as that which is measured by the particular instrument. In a word, we can then define the quality in terms of certain operations. Educational measurement has tended to ignore this character of operational definitions. A s a result, we have justified the consequences of our measures by the instrument, rather than the instrument by the consequences. This is seen in the fact that when our instruments are seriously questioned, we defend them on the ground that they must measure something and that therefore this something is learning, thus abandoning the instrumental character of our logic. It is well to bear in mind that the operational definition of a property is applicable only after the measurement of the property has been established in terms of consequences. If the consequence of measurement is satisfactory, it is substantial evidence in support of the validity of instruments. W e are naturally brought face to face, therefore, with the problem of determining the results of educational study based upon the use of developmental instruments. In the last quarter of a century, these instruments have been used in literally hundreds of experimental studies. W h a t has been the outcome? In the last twenty-five years, we have explored problem after problem by controlling conditions, varying experimental factors, measuring results, and mathematically manipulating data. T h e findings have been conflicting and have led to con5

Bridgman, Logic of Modem Physics, Chapter I.

92

Logical Aspects of Validity

elusions on isolated points whose relations to other points were seldom known. They have thus too often been fragmentary and disconnected, resulting in the accumulation of data, but leading to no set of consistent and substantial general principles. A science cannot be founded upon fragmentary studies and facts which cannot be made to support general principles. The collection of objective data is not the aim of science. It is merely one of the means by which science progresses. The goal of science is the formulation of principles, by means of which a variety of phenomena may be explained and brought under control. Now the findings resulting from the application of our instruments fall surprisingly short of the capacity to support general principles. They have not enabled us to advance a set of principles for the effective reconstruction of our educational practices. They have not enabled us to discover general guides for so increasing the transitory value of the school's influence that what is taught will be more effective in the activities of life. They have not enabled us to discover means by which to reconstruct the school in any fundamental sense, so as to influence pupils more effectively. It must be admitted, however, that developmental instruments have helped to analyze some of our honored educational practices and theories, such as, for example, methods of teaching, classification and grouping of pupils, doctrine of formal discipline, and rating of teachers. This result, however, has come more from the knowledge gained through making the analyses necessary to experimentation and measurement than from the outcome of measurement itself. Moreover, we are also able to establish certain relations. One is able to find the relation of one set of numerical scores, representing learning, for example, to another set, thus providing a more stable basis of prediction than we could otherwise have. On the basis of intelligence-test scores, achievement-test scores, teachers' marks, and so forth, the academic

L o g i c a l Aspects of V a l i d i t y

93

success of individuals may be forecast within the limits of determinable error. But just what the implicated factors are we do not know. Our results may be a prognostication of the capacity of the individuals to carry about in their heads a large number of facts at a given time. Or they may indicate that the theory of learning held by the instrument-maker is similar to that held by the teachers themselves. W e do know that they are predictions within a closed system, and, when relied upon for the rigorous selection of students, the instruments tend to preclude a reorganization of the educational program. In terms of the transitory value of the school's influence, we do not know what the relationships that we establish really mean. W e have no adequate assurance that the learning, as defined by the instruments, will be functional in the kinds of activities which living demands, even if what is taught is essential in those activities. It is enlightening to note that one attempt at a thoroughgoing examination of fundamental school practices, in the light of the mass of data that has accumulated in recent years, ends with this conclusion: If we wish to improve the effects of the school, we must increase the academic zeal of teachers and improve the staff with respect to personality. W e cannot expect an improvement to result from a reconstruction of the program of the school in either its administrative or instructional aspects. 6 This conclusion has the earmarks of a fundamental general principle, but a careful examination of the study will disclose to anyone that this is not the only conclusion that may reasonably be drawn from the parade of data, and that the study is not free from indications that it is a case of unwitting manipulation of data to fit a theory. 8 Stephens, The Influence of the School on the Individual, pp. 105-6. T h i s conclusion is an implication of the hypothesis developed in the study, and, hence, will stand or fall with the hypothesis. Since the author claims that the hypothesis is not established, this conclusion should be taken merely to represent one possible interpretation of the findings of the investigations involved. See Ibid., p. 104.

94

L o g i c a l Aspects of V a l i d i t y

It should be clear that the scientific study of educational problems is not in question. The scientific method is broader than any of its operational principles and physical manipulations. W e are simply dealing with one small but significant aspect of the scientific study of education—the validity of developmental instruments. A complete demonstration of the lack of validity of these instruments would no more cast doubt on the method of science in education than the discovery of the inadequacy of some principle in physical science would cast doubt on its efficacy in that field. It cannot be reasonably maintained that no general principles have been evolved in educational study carried on by the methods of science. But experiments conducted by means of instruments such as those under discussion are disappointing, when we consider the paucity of consistent and undisputed principles. Witness, for example, the controversies that have occurred over experiments on class size, methods of teaching, and homogeneous grouping. The usual defense evolved to explain the lack of substantial generalizations is that experimental study in education is only in its infancy, and that in due time undisputed general principles will be derived. It is, of course, true that educational experimentation is only a recent extension of the techniques of science, and we can only accept this fact as a possible explanation of its failure to produce general principles by means of developmental instruments. On the other hand, it should be remembered that in the development of other fields of science youth has not been a limiting factor. Thus, for example, Galileo's studies of falling bodies led immediately to a general principle. It is reasonable to suppose that where our findings continue to be disappointing, the validity of instruments may be justifiably questioned. W e cannot overlook the fact that the paucity of generalizations is probably due to the invalidity of instruments, considered from a wider and more adequate interpretation of the learn-

Logical Aspects of Validity

95

ing process and its outcomes. There is clearly a need for a more thorough analysis of learning, with a view to discovering its fundamental character, for only thus shall we be able to formulate more satisfactory principles by which to validate instruments of measurement. It cannot be reasonably entertained that the quantitative theory of learning, on which instruments are now validated, has exclusive claim to scientific credence and finality. SUMMARY

In the foregoing pages we have considered the procedure of validating instruments from a wider and more inclusive standpoint than is ordinarily done. In doing this, it was seen that the logic of validity must of necessity be polar. This fact raised the fundamental question of how we know whether or not instruments are really valid, inasmuch as there is apparently no place to ground the argument. To escape this difficulty, it was pointed out that complete arbitrariness is avoided by two facts : first, the properties place restrictions upon the kinds of manipulations which may be used ; and, second, the validity of a guiding principle may be confirmed by the value of the relationships it permits us to establish. On the basis of these criteria, the validity of developmental instruments was examined. When judged by the consequences which issue from the use of these instruments, serious question is raised as to their validity. It was also shown that there is no reason why the properties with which we deal could not admit of operations other than those now utilized. These facts indicate both the need and the possibility of exploring learning by other intellectual analyses, so that we may base instruments upon more fruitful principles than those of mere quantity, rendering them valid in terms of a more adequate conception of learning.

VI

Performance and Validity AS HAS been shown previously, the validity of achieve- i Y m e n t tests rests upon the assumption that correct responses to items can be taken as an index of an individual's total learning. That is, when we enumerate the correct responses of persons to items of a test and express them numerically, it is assumed that these scores will arrange persons in order of their true total achievement. For such a conclusion to be sound, the correct responses elicited by the items must be near enough alike to be classed together as responses having, so to speak, a common denominator. 1 Putting the same thought in other words, we proceed on the ground that the responses are ultimately the same kind of thing. They are not merely reduced to the same class by subsuming them under a general class as is done, for example, when we enumerate 3 horses, 5 automobiles, 2 houses, and 4 apples, as 14 objects. This much is implicit in the very logic 1 It is of course true that the theory of factor analysis in psychology is teaching us that performance, even when controlled by the most rigorous instruments at our command, is by no means homogeneous. T h i s is true whether w e consider the performance of one individual or a comparison of the performance of t w o or more individuals under the same external conditions. Applied to education, it indicates that measures of an individual in a school subject may involve two or more components, and the problem of finding the number of these components and determining their relative weights is rapidly claiming the attention of workers in the field of educational measurement. T h i s is a significant development that will no doubt help to clarify the problem of validity. But factors such as perceptual speed, verbal ability, number facility, memory, and so on, which may be embedded in any single measure of an individual in respect to learning, even if analyzed and weighted, would still not relieve us of the necessity of formulating the internal events that accompany the performance designated as learning. There is some common feature to which the term learning refers. A n d it is to that feature that the present discussion is directed.

Performance and Validity

97

of measurement itself, for if they are not the same kind of thing, the order which is established will not embrace a continuous quality. And if the order is not established in terms of a continuous quality, it does not constitute measurement any more than does the order of books on a shelf or houses on a street. In the following pages it is our purpose to show that responses, considered in a fundamental sense, do not necessarily have, so to speak, a common denominator. We shall advance and defend the hypothesis that correct responses of persons to the same item can be taken as indices of quite different results, when the personal structure of individuals is taken into account. That is, responses which are apparently the same under the external conditions imposed by an instrument of measurement are not necessarily the same when considered from the standpoint of the events which underlie overt behavior. We shall therefore examine the personal aspect of behavior, with a view to discovering the relations between it and the outcomes of the learning process. In doing this, we shall deal with the following questions : Is behavior merely overt? If not, what is its hidden aspect? How are the outcomes of the learning process related to this aspect? What bearing do the answers to these questions have on the problem of validity? T H E OUTER A N D I N N E R ASPECTS OF BEHAVIOR

In answering the above questions, we shall be forced to explore the inner dynamics of behavior. We are of course aware of the speculative character of our undertaking. As Köhler has so well pointed out, "between the two terms of the sensori-motor circuit there is more terra incognita than was on the map of Africa sixty years ago." 2 Köhler also said, in the same connection, " I f behavior is to be understood 2 Köhler, Gestalt lishers.

Psychology,

p. 54, by permission of Liveright, pub-

98

Performance and Validity

as depending upon inner organization as well as outer conditions, we must try to imagine the modes and traits of these inner processes, which are either started from without by environmental stimulation or aroused intro-organically by inner dynamics." 3 We are therefore entering a region in which we shall be forced to depend upon intellectual constructions for guidance. In short, we shall be speculative, bearing in mind that speculation is indefensible only when it becomes an end in itself. Overt behavior, however, will still be the primary datum, but we shall frankly recognize it as an index of something present only by inplication. In other words, there is something implicitly present in overt behavior, for otherwise it would be like the grin of the cat in Alice in Wonderland: it would be without a face. Overt behavior will thus be the point of departure in formulating conceptions of the underlying aspects of responses. We shall note first, however, that behavior can be considered from the standpoint of inner as well as outer dynamics. We shall begin with overt behavior. A man is writing a letter to a friend. He makes certain rhythmical movements with the wrist and hand ; a pen is grasped in a certain way ; marks which we call writing are made on paper ; certain glandular processes are quickened; the eyes move in a more or less rhythmic manner; the vocal apparatus moves imperceptibly; the neuromuscular system is keyed up to the task in hand. All these activities, together with others that a thorough analysis might reveal, when properly related and working together constitute a behavioristic description of what the man is doing. This account tells what the man does in writing the letter, but it does not tell us how he knows that he is writing a letter. At this point radical behaviorism and mechanistic explanations break down. The man not only writes a letter, but 3

Gestalt Psychology, p. 54, by permission of the publishers.

Performance and Validity

99

he also knows that he is writing it. The fact that he knows what he is doing is not just a mere accompaniment of events usually called physiological. 4 It is not just an epiphenomenon that can be ignored without loss of anything essential. On the contrary, writing a letter would be a different activity if this quality of behavior were not present. Color, for example, is a qualitative accompaniment of certain events that involve a sentient being and certain vibrations in the external world, but it is not thereby an extravagance of nature that might just as well be absent, for the world would be quite different without it. In the case of the man writing a letter, we observe certain overt responses, and it is more or less established that other responses, such as movements of the vocal apparatus, could be detected with the aid of mechanical devices. W e also observe that there are external consequences of these responses : a letter is written. It is equally true that these responses have an internal consequence ; otherwise, the man could not know that he is writing a letter. W e may say, then, that the first consequence represents the viewpoint of the observer and that both of the consequences denote that of the subject. T o an observer, able to forget for the time being that he is himself an agent, the performance of the subject appears to be no more than a series of stimuli followed by responses. The subject, however, sees the performance more completely. T o him it is not only what is actually done overtly, but it is also an experience—deliberation, weighing of alternatives, making of decisions, the expression of attitudes, and the fulfilling of desires. But, it may well be asked, how does the inner consequence of behavior enable the man to know that he is writing a letter, or, for that matter, to know what he is writing? T h e answer to this question becomes clear when it is seen that one can know what one is doing only when the performance 4

Koffka, Principles of Gestalt Psychology,

pp. 39 f.

ioo

Performance and V a l i d i t y

has a counterpart in one's experience. When the man says to himself that he must write a letter, what he does is to arouse in his experience the same attitudes of response that the expression would tend to provoke in other people. This emergence in experience is the way in which we know what we do ; and it is an internal aspect of behavior. If one pronounces and hears himself pronounce the word "table" [says Mead], he has aroused in himself the organized attitudes of his response to that object, in the same fashion as that in which he has aroused it in another. W e commonly call such an aroused organized attitude an idea, and the ideas of what we are saying accompany all of our significant speech.5 W e thus see that it is precisely this internal aspect of behavior that accounts for the fact that the man is not only able to write a letter, but also to know what he is doing. There are, of course, some responses that do not rise to the level of experience. The prototype of this kind of response is found in the experiment on the salivary reflex by Pavlov. In this experiment, a flow of saliva is produced in the mouth of a dog by gustatory stimulation from food in the mouth. This is done on a number of occasions, and each instance of gustatory stimulation is accompanied by some auditory stimulation, such as the sound of a bell. A f t e r several simultaneous presentations of the auditory and gustatory stimuli, it becomes possible to evoke the flow of saliva simply by presenting the auditory stimulus alone. W h a t has happened here is that one stimulus has been substituted for another. There were already certain primary responses, such as the flow of saliva when food was placed in the mouth, and the pricking up of the ears at the sound of a bell. There seems to be no functional relation among such responses, apart from some process of conditioning such as here described. That is, by presenting the two stimuli simultaneously, the 5 Mead, The philosophy of the Present, Court Publishing Company, publishers.

p. 189, by permission of the Open

Performance and Validity

ιοί

auditory stimulus comes to evoke not only the pricking up of the ears but also the flow of saliva. In this case the responses are below the level of consciousness. It can hardly be said that salivation takes place under conscious direction, and the experiment was conducted under conditions that tended to rule out whatever consciousness might have otherwise entered into the process. Thus we seem to have a change in behavior that takes place on the level of events which are ordinarily considered as physiological. There is no indication that the dog knew that he was behaving in this particular manner at the sound of the auditory stimulus. The above illustration is perhaps the purest instance of responses of this type. However, on the human level we find many instances that approach this apparently completely physiological type of behavior. The way in which children become afraid of certain objects is a classic illustration of the acquisition of this type of response. A child playing with a cat hears some strange and frightening noise, and thereafter the cat is sufficient to evoke fear. As further examples there are the more or less habitual courses one takes in walking home, the movements of the eyes in reading, the posture when one is speaking, and the unique characteristics of one's leg and arm movements in walking. It appears that such responses can be more or less adequately described in physiological terms. This type of response, however, plays a very small part, if any, in responses of individuals to the items of a measuring instrument, and is no doubt entirely limited to true-false items. W e shall simply note in passing that this is one type of behavior which responses to items of a test might conceivably represent. On the other hand, a very large portion of our behavior does become internalized, and this internalization makes possible the personal life of the individual. Responses that

I02

Performance and V a l i d i t y

have a counterpart in experience are essential to personal development and growth. They not only give rise to a "realm of purely personal events that are always at the individual's command, and that are his exclusively as well as inexpensively for refuge, consolation and thrill," 0 but they also enable individuals to take an intelligent part in the social process. They enable persons to foresee consequences, to mold their conduct in the light of its probable effects upon others, and to examine and to evaluate themselves in the light of the way others react to them. In these claims, however, we have gone beyond the conclusions permitted by the mere fact that behavior has an internal aspect. T o find support for these claims, we shall have to attempt an intellectual construction of the events which underlie such complex forms of conduct. B y pursuing this course, we shall be able to gain a more adequate idea of the importance which a response to an item may assume in an individual. T H E SELF A S A S T R U C T U R E OF A T T I T U D E S

In turning to a more fundamental consideration of behavior as a basis for examining the validity of instruments, it is encouraging to find that the foundation of such an examination has already been laid. This foundation is found in Mead's treatment of the phenomenon which he designates as the self. 7 W e shall follow Mead in describing this phenomenon and shall supplement his account from other sources as the need arises. In introducing the term " s e l f " into a discussion of educational issues, one is almost persuaded to offer his apologies for digging up the dead. W i t h the penetration of behavioristic psychology into educational thinking and the gradual ac6 D e w e y , Experience and Nature, p. 172, b y permission of the Open C o u r t P u b l i s h i n g C o m p a n y , publishers. 7 M e a d , Mind, Self and Society, passim.

Performance and Validity

103

ceptance of the mechanistic interpretation of what men do, eduational thinkers became "tough-minded," to borrow an expression f r o m James, and the phenomenon called the self was pushed aside along with introspective psychology. It is almost always true, however, that a thorough house-cleaning discards some valuable objects which later have to be replaced, especially when the objects serve some indispensable function. So today there is a tendency in both psychology and education again to recognize the phenomenon to which the term self is applicable, and to reclaim it f r o m the domain of abnormal psychology, to which it was restricted when socalled scientific psychology refused to give it recognition. The introduction of the term self does not necessitate a revival of the methods of introspective psychology. The writer has no hankering for such barren methods. His motive is simply that of recovering the significant factor of experience, which educational thought and measurement disregarded when they surrendered the introspective approach. No one who has observed the advancement of psychological and educational study in recent years can seriously entertain the idea of abandoning the behavioristic approach to the study of those phenomena usually classified as mental. Nevertheless, it may be asked, is it inevitable that a reference to the phenomenon which we choose to call the self can be made only by surrendering the behavioristic standpoint? A positive response seems to be the answer given by many educational psychologists in America. This answer is neither necessary nor fruitful, for the phenomenon which we call the self is at the heart of the problem of validating instruments of measurement. The view which we seek to present is objective enough to embrace behaviorism and broad enough to include the facts of one's experience. Mead's fundamental assumption is that the social process antedates the conduct of the individual. 8 T h e social act is the 8

Mead, Mind, Self and Society, pp. 6 ff.

I04

Performance and Validity

primary datum of psychology. T h e social process is carried on by means of gestures, 9 the gesture of one individual in a social process being the stimulus f o r which the gesture of the other is the response. H e illustrates this with a description of a dog fight, in which the gesture of one dog is the stimulus to the other dog f o r his response. In the dog fight, however, there is no internalization of behavior. 1 0 If the gesture of one dog had called up in himself an attitude in experience, as distinguished f r o m the attitudes assumed by his body in the fleeting and mobile positions taken in the fight, and if this same gesture had also evoked in the other dog a comparable attitude, the dogs would have been acting as human beings. T h a t is, they would have been capable of carrying on a conversation, engaging in reflective processes, and distinguishing between good and evil. Owing, perhaps, to a more complex physiological mechanism, we find acts on a higher gradient of the social process, in which, according to Mead, internalization of performance arises and personal experience comes into existence. These acts also take place by means of gestures, but they carry the added feature of experience called up by the gestures. The gestures which have the capacity to call out in the agent experiences similar to those called out by the recipient of the gestures, Mead calls significant symbols. 1 1 H e finds the ideal significant symbol in language, f o r vocal gestures can be readily received by both the agent and the observer, whereas such gestures as facial expression are not accessible to the agent himself. 1 2 9

Mead, Mind, Self and Society, pp. 13 ff., 42 ff. Mead, Mind, Self and Society, pp. 42-43. Mead, Mind, Self and Society, pp. 45-46, 61 ff. 12 Dewey seems to hold a similar position. Compare his discussion in Experience and Nature, Chapter V. Tolman disagrees with Mead as to the origin of ideas, holding that ideas precede speech in point of origin. Nevertheless, he seems to agree that speech is the chief instrument of carrying on conversation with oneself. See Tolman, Purposive Behavior in Animals and Men, Chapter X V . 10 11

Performance and V a l i d i t y

105

The phenomenon called the self thus arises out of experiences evoked by vocal gestures. Since a vocal gesture enables an individual to call up in himself an experience similar to that which it evokes in another, he is able to take the rôle of the other person and to respond to his own ideas as the other person might respond. Thus by being himself and another person at one and the same time, he becomes an object to himself. 1 3 Mead distinguishes two stages in the development of the self. 14 T h e first is that which is found in play. The child is able to take the rôle of different persons and to act toward himself as these persons might act. He takes, for example, the rôle of teacher, fireman, engineer, or carpenter. B y so doing he plays the part of each of these characters. Play, however, is simply taking the attitude of single individuals. The rôle of a number of persons is not taken at one time, so that there is no organization of attitudes. This would imply a more developed state of the self, and Mead finds this second stage in activities such as games. In games the individual assumes the attitude of a number of persons at the same time. One cannot play as a half back on a football team, for example, unless he assumes the rôle of a number of other individuals. He can direct his own movements successfully only by implicitly assuming the attitudes of a number of individuals at the same time. T h e indvidual in the game thus comes to have an organized or generalized attitude in respect to particular behaviors of other players. This fact distinguishes the more mature self from that which is found in play. These generalized attitudes afford the explanation of social actions and relations. The individual incorporates in his own action the organized attitudes of others. That is, he is able to take the attitude of social groups or of the community 13 14

Mead, Mind, Self and Society, p. 138. Mead, Mind, Self and Society, pp. 152 ff.

io6

Performance and V a l i d i t y

toward his own contemplated conduct. In this way he comes to act as a social being, and as such he is acting as a self. W e thus see that the self is a social phenomenon, 15 capable of a behavioristic description. In a behavioristic sense, the self is a character of behavior. That is, when an individual directs his conduct by incorporating in his experience the probable reactions of other persons to what he is going to do, his behavior has the quality which we call the self. W e can also think of the self as a "structure of attitudes," 1 6 without lapsing into a conception of the self as a rigid and fixed arrangement. "There is," says Dewey, "no one ready-made self behind activities." 1 7 Then he goes on to say that "There are complex, unstable, opposing attitudes, habits, impulses which gradually come to terms with one another, and assume a certain consistency of configuration, even though only by means of a distribution of inconsistencies which keeps them in water-tight compartments, giving them separate turns or tricks in action." 18 A s a structure of attitudes, the self is a "system or complex of systems," 1 9 to borrow an expression from Lewin. T h e attitudes are not just here and there, then and now, confused, helter-skelter. They are, of course, highly diversified, but amid the diversity there is always a tendency toward unity. This unity gives the individual his personal stability, his persistence through time as a person and not just as an organism, his abiding and persistent aims, 20 and his most 1 5 K o f f k a also emphasizes the social character of the self. " T h e social framework is of paramount importance for the development of the Ego. W h e n w e speak of a personality w e think as a rule of the E g o within its culture," Principles of Gestalt Psychology, p. 676, by permission of Harcourt, Brace, publishers. 1 6 Mead, Mind, Self and Society, p. 163. 1 7 D e w e y , Human Nature and Conduct, p. 138, by permission of Henry H o l t and Company, publishers. 18 Ibid., by permission of the publishers. Italics mine. 1 9 Lewin, Dynamic Theory of Personality, p. 56. 20 K o f f k a says, "needs are . . . states of tension which persist until they are relieved. O u r most general aims are therefore permanent, tensions

Performance and V a l i d i t y

107

cherished values. But complete unity is seldom attained. There are certain systems of attitudes that tend to dominate our actions, but at one time or another all of us feel cleavages in ourselves. Other sets of attitudes from time to time flicker in our consciousness, and sometimes they even find release in overt action. A somewhat similar construction is advanced by K o f f k a in his description of the ego system. H e distinguishes between the surface or peripheral aspects of the system and the core, that is, "the Self, whose tensions are much greater than those of the other sub-systems, representing real needs as opposed to the quasi-needs of our superficial intentions." 2 1 Speaking physiologically, the organism is something different from the parts which make it up ; speaking psychologically, the self is an emergent on a still higher plane. It is a new quality, a new psychological phenomenon. But it is not a fixed and unchanging object, circumscribed by inflexible boundaries. O n the contrary, it is continually changing and growing, but maintaining its identity and stability amid constant changes. 22 Some of the things which insulted us in our youth no longer get under our skins, and the successes and triumphs that used to give us such elation are now beneath the dignity of the more developed self. This phenomenon which we call the self is not a luxury, without which we could get along just as well. Its presence is a significant factor in the guidance of conduct. Intellectual development, as is true of any development, is not just a matter of chance. Chance of course is an important factor, but the general orientation of our behavior is due to the self in which we find our profound and abiding interests, hopes, which last through great parts of our lives. T h e s e needs being our needs, they belong, of course, to the E g o system." Principles of Gestalt Psychology, pp. 329-30, by permission of Harcourt, Brace, publishers. 2 1 K o f f k a , Principles of Gestalt Psychology, p. 342, by permission of the publishers. 2 2 K o f f k a , Principles of Gestalt Psychology, p. 331.

io8

Performance and Validity

and aspirations. In mature people, behavior is not wholly determined by momentary needs and desires, and in the attainment of satisfaction not every stimulus is allowed to evoke behavior. When apparent needs are inconsistent with the tendencies of the self, they are likely to suffer alteration or to be rewarded with inaction. 23 It is the self that feels proud, elated, vain, depressed, degraded, humble, insulted. These are feelings that carry with them a sense of me-ness. An insult, for example, need not be evoked by a peril that endangers life, for it is more likely to be aroused by acts that have no reference to bodily harm. It is an affront to "me" and not to the body. In cases of personal depression, it is the me that is depressed and not some particular reaction, though inability to execute reactions may be the beginning of the depression that ultimately brings down the whole self. Moreover, if immediate value be excluded from consideration, it is due to the phenomenon of the self that we have values. With the child's collaboration with others in games and other more serious social situations, norms arising momentarily as well as those well established become an integral part of himself. In this process commands and norms are interiorized in him. With the intériorisation of social norms or values, we have the transition from heteronomous conduct to autonomous morality. In the latter the behavior is regulated from within as well as by the sheer force from without.24 These internalized consequences come to have value, depending upon the extent to which they fit into the movement toward unity. In so far as this whole, that is, the self, has worth in virtue of its unity, each internalization that is "possessed" by it rises to the level of value. The purpose of this discussion is to give a brief de23

Sherif, The Psychology of Social Norms, p. 171. Sherif, The Psychology of Social Norms, p. 182, by permission of Harper and Brothers, publishers. Italics not mine. 24

Performance and V a l i d i t y

109

scription of the phenomenon of the self and to show that it is a factor in behavior. 25 It should now be evident that the self must be taken into account in psychological explanations of learning on the level of personal experience. N o w what we want to know in connection with the problem of validity is whether or not there is a particular kind of learning outcome which involves the self, and whether it covers all the results of the educative process. If there are significantly different kinds of learning outcomes, and instruments of measurement make no distinction among them, the validity of such instruments is open to serious question. A s we shall attempt to show, this is the actual status of our achievement tests. W e are thus faced, first, with the problem of setting forth the characteristics of learning outcomes that involve the self; and, second, with the task of showing that if there are other outcomes which do not involve the self and which thus have significantly different consequences in life, the validity of instruments is in doubt. THE SELF AND OUTCOMES OF LEARNING

W h a t kind of behavior involves the self? The answer to this question was foreshadowed in the discussion of the preceding section. It will be remembered that the action of the organism tends to arouse in itself those responses which its action tends to evoke in others. That is, some behavior has internal consequences which emerge in experience and make possible the development of the self. Thus in communication a vocal act tends to arouse in the individual who makes it the same attitude that is evoked in the recipient of the act. A s Mead expresses it, "a person who is saying something is saying to himself what he says to others ; otherwise he does not 25 For a more complete discussion of the self, one should refer to the works of Mead, Lewin, K o f f k a , Köhler, and Piaget, to mention only a few of the outstanding leaders. T h e works of the psychoanalytic school headed by Freud is of course a rich source of information.

ι io

Performance and V a l i d i t y

know what he is talking about." 26 This arousal in oneself of the attitudes which his act tends to evoke in others enables him to adjust what he is going to say and do in terms of these imported attitudes. That is, he is able to take the rôle of others in deciding what his acts shall be. W h e n an individual acts in this w a y he is behaving as a self, for a self is "an individual who organizes his own response by the tendencies on the part of others to respond to his act." 27 This view of behavior affords the basis for understanding how learnings, that is, outcomes of learning, become a part of the self. In considering this problem, it is best to discuss specific classes of outcomes of learning rather than outcomes in general, and we shall therefore digress long enough to present some general types of outcomes. There are in general four segments into which the qualitative world may be cut, as determined by the kinds of adjustments which human beings make. These segments are the physical, the social, the affective and aesthetic, and the manipulatory. T h e outcomes of learning which correspond to them we have chosen to call respectively attitudes of physical understanding, attitudes of social understanding, attitudes of appreciation, and skills. A n attitude of physical understanding is simply a disposition which one takes toward an aspect of the physical world. This disposition or attitude is built up in the same manner as any element of the self, that is, by reacting to nature in terms of nature's tendency to react to what we do to it. A s ordinarily stated in textbooks, density is defined as "mass per unit of volume." The principle of density does not become a "possession" of the self, however, until an individual has made certain reactions to the responses of nature, directly or indirectly, so as to take on the rôle of nature and to internal2 6 Mead, Mind, Self and Society, p. 147, by permission of the University of Chicago Press, publishers. 2 7 Mead, The Philosophy of the Present, p. 184, by permission of the Open Court Publishing Company, publishers.

Performance and Validity

in

ize his performance in respect to it. 28 W h e n he has made sufficient reactions to nature's behavior, he comes to have a change in disposition, a new attitude, a change within the self which we call an attitude of understanding. That is, he can speak to himself as nature would speak to him, and thereby mold his actions in terms of nature's tendency to react to what he is going to do to it. It is this ability to commune with nature that distinguishes a person who has an attitude of understanding toward a principle of science from one who can merely work out problems and answer questions. What has been said of attitudes of physical understanding is true also of attitudes of social understanding, except that in this case it is the social world that is involved, rather than merely the physical. Here the individual also builds changes in his self through the process of internalization. The principle that people tend to migrate in search of better economic conditions becomes a part of the self only by the individual's molding his responses in terms of the reactions which others would tend to make in the situation. He is able to ascertain these reactions because the self enables him to be others and himself at one and the same time. 29 When the internal disposition which this principle describes has been established by repeated internalizations, the self undergoes change and we speak of this change as the acquisition of a new attitude of social understanding. With attitudes of appreciation the principle of acquisition is very much the same, but the subject matter is considerably different. Appreciations involve more of what is called the affective side of ourselves. This side is not segregated from the intellectual aspect, for there is always something affective connected even with the most rigorous intellectual analysis. There are behavior acts, however, which we can think of as Mead, Mind, Self and Society, pp. 184 f. For a complete development of this idea, see Mead, Mind, Self and Society, pp. 144-64. 28

29

112

Performance and V a l i d i t y

being almost completely affective just as we can think of others that are almost completely intellectual. T h e classification is thus a matter of emphasis and not of differentiation. It is convenient to recognize that appreciations involve values and that these values m a y be thought of as ethical and aesthetic. W e come to accept a value, as distinguished from merely recognizing it, by the process of internalization. T o r e f e r again to Mead, the artist tries to find the sort of expression that will arouse in others what is. going on in himself. The lyric poet has an experience of beauty with an emotional thrill to it, and as an artist using words he is seeking for those words which will answer to his emotional attitude, and which will call out in others the attitude he himself has. H e can only test his results in himself by seeing whether these words do call out in him the response he wants to call out in others. 30 In other words, the artist expresses his o w n feelings in terms of an audience sense, but the test of whether or not his audience is reached is f o u n d in his o w n inner certitude, f o r if his work is able to recreate in himself the feelings which he intended to express, together with all their shades, tones, and nuances, then he has reached that degree of perfection which the self of the artist demands. It is perhaps this internal test of value that bears up and gives hope to artists like V a n G o g h , whose w o r k s are not recognized in their lifetime. T h e y feel that if only they express themselves, others will come to k n o w them. It is the recognition of this p r o f o u n d psychological fact that has led R u g g to object to external standards of evaluation in the field of the arts. 3 1 It should n o w be clear that w h e n an individual has built into his self an attitude or set of responses which an artist, f o r example, intended to evoke, he has a genuine attitude of appreciation which is truly a change in the self. 30 Mead, Mind, Self and Society, pp. 147-48, by permission of The University of Chicago Press, publishers. 3 1 Rugg, Culture and Education in America, pp. 375-77.

Performance and Validity

113

By skills we mean those outcomes of learning which involve less of the affective and rational aspects of ourselves than do appreciations and understandings. Among such outcomes are reading, typing, handwriting, elementary number combinations, spelling, and mere information. These outcomes seem to be attained through repetitive responses. H o w ever, they are not simply neuromuscular, for they involve a conscious element. If, for example, one lost his hands and were consequently no longer able to typewrite, he could still say to himself, "I know how to type but I cannot do it." But in ordinary circumstances one does not have to remember how to type in order to do it. 32 As a matter of fact, the practice of recalling typing as a past experience will interfere with the proper exercise of the skill. On this point we are also in substantial agreement with Morrison, when he says that these outcomes of learning, that is, skills, are not content in memory, but elements of the "fabric of personality," by which he seems to mean something comparable to the phenomenon we call the self. 33 U p to this point we have proceeded as though all outcomes of learning had equally "honored" places in the self. This is, of course, not true. Their "positions" may vary f r o m the core to the surface. Some outcomes of learning no doubt penetrate the self to its core. They are those on which great value is placed because of their particular relation to the more or less permanent purposes, desires, needs, and activities of the individual. Other outcomes may have varying degrees of relations, extending f r o m the core outward. In any case, the interiorization of behavior will vary with the tensions that arise in the self. The relations of outcomes of learning to the self are not fixed. W h e n one's fundamental purposes and activities change, resulting in a reconstruction of the self, the outcomes 32 33

Koffka, Principles of Gestalt Psychology, p. 506. Morrison, Basic Principles in Education, Chapter V I I I .

114

Performance and Validity

of learning will undergo fundamental alterations in their relations to the core of the self. In other words, the part which outcomes play in the self is contingent upon the more or less persistent needs and life purposes of the person. We, of course, have fancies and momentary needs that arouse us to action. A person is writing and the door bell rings. He goes to the door, carries on a brief conversation, and returns to his writing as though nothing had occurred. We also have more or less persistent needs and purposes. The school attempts to transcend the momentary and transient needs of individuals. Instruction attempts to dig into the* deeper recesses of the person and to touch his more abiding needs. The emphasis now being placed on the designing of an educational program adapted to life needs and purposes of individuals, therefore, becomes even more defensible when it is considered in the light of the phenomenon called the self. There are also outcomes of learning which float on the surface of the self and which have very little, if any, connection with it. These outcomes enable us to respond to items of a test in the same manner as do those which belong to the self. They are outcomes, however, that have quite a different status in the individual. They play little, if any, part in his activities, and within a short while they are no longer available, even in the form of responses to test items. In considering these outcomes of learning, it is encouraging to find in the works of an outstanding authority suggestions of just such a distinction. In considering the question of whether or not there are certain experiences which in themselves are related to the self, Koffka comes to the conclusion that thought can be experienced outside of the self. In the course of his argument he gives the following suggestive, although unconvincing, illustration. Many people have had dreams like or similar to the following: they are taking an oral examination with a group of colleagues ; the examiner asks them a question which they are unable to

Performance and Validity

115

answer, whereupon the examiner turns to the next candidate, who promptly supplies the correct answer. In this dream occur two thoughts and both in minds which are not the dreamer's Ego, although they are in his dream. T h e question is asked by the examiner, and the right answer is given by a student, while the E g o of the dreamer was unable to produce it. The answer then occurs in the dreamer's field, but not in that part of it which makes up his Ego. 3 4 Children are not ordinarily supposed to be dreaming in school. Nevertheless, K o f f k a ' s illustration does vividly present the distinction, even though it must n o w be located in the waking state. Fortunately, w e do not have to depend upon the phenomena of dreams f o r the distinction which we seek to present. V e r y similar kinds of behavior are found in the conscious processes ; and, if we are to believe the j u d g m e n t of some people, they dominate the class-room spirit of the school. L e t us consider a case in which an individual is forced by the pressure of circumstances to do something which he does not really want to do and in which he can see no real value or significance to himself. M a n y of the outcomes of learning which individuals acquire in school are no doubt acquired under just such circumstances. In such a case there is little interest, if any, in the w o r k , and the individual goes about it in a more or less h u m d r u m fashion. In such a case the individual acquires, let us say, certain understandings or bits of

information. H e does this b y

listening to explanations, reading, or p e r f o r m i n g other exercises. B u t a l w a y s the ideas belong to someone else, the thoughts are not his thoughts, and the understandings are not his understandings. H e understands, so to speak, but he does not have the attitude

o f understanding. T h e s e outcomes of

learning thus do not affect the structure of the self. T h e y stand apart so that they are immune f r o m the effects of other outcomes, and the individual finds that he cannot carry on a con3 4 K o f f k a , Principles of Gestalt Psychology, court, Brace, publishers.

p. 327, by permission of H a r -

116

Performance and Validity

versation with himself in terms of them. H e can think upon them, but he cannot interiorize them so that they enter into his thinking. Many years ago Bergson noted this kind of learning product and contrasted it to the products which are assimilated by the self. H i s clear and impressive discussion warrants an extended quotation : The beliefs to which we most strongly adhere are those of which we should find it most difficult to give an account, and the reasons by which we justify them are seldom those which have led us to adopt them. In a certain sense we have adopted them without any reason, for what makes them valuable in our eyes is that they match the color of all our other ideas, and that from the very first we have seen in them something of ourselves. Hence they do not take in our minds that common-looking form which they will assume as soon as we try to give expression to them in words ; and, although they bear the same name in other minds, they are by no means the same thing. The fact is that each of them has the same kind of life as a cell in an organism : everything which affects the general state of the self affects it also. But while the cell occupies a definite point in the organism, an idea which is truly ours fills the whole of ourself. Not all our ideas, however, are thus incorporated in the fluid mass of our conscious states. Many float on the surface like dead leaves on the water of a pond: the mind, when it thinks them over and over again, finds them ever the same, as if they were external to it. Among these are the ideas which we receive ready made, and which remain in us without ever being properly assimilated, or again the ideas which we have omitted to cherish and which have withered in neglect.35 Outcomes of learning that float on the surface of the self are perhaps always acquired under conditions of low intensity. I n a process of low intensity, the internalization of behavior is crippled so that the individual does not internalize to the extent of taking on the outcomes of learning that are, 35 Bergson, Time and Freewill, pp. 135-36, by permission of The Macmillan Company, publishers. Cf. Whitehead, The Aims of Education and Other Essays, pp. 1 - 2 ; and Kilpatrick, Remaking the Curriculum, pp. 27 f.

Performance and V a l i d i t y

117

so to speak, enveloped by the self, or at most, they are not focused toward the self. Under conditions of low intensity, the individual may resort to memorization, or, what is even more likely, to an unnamed form of learning in which the individual can do certain tasks without having the attitudes which his performance implies. A n y experienced teacher of high-school algebra, for example, can recall students who acquire the mathematical symbols and a facility in manipulating them in accordance with the rules of the game. Confronted, however, by mathematical problems, they solve them more by a sense of conformity to rules than by an understanding of the operations. In such a case, students merely do what is to be done. The outcomes of learning are foreign to the tensions within the self and they lie outside of that structural object. Such students understand, but they do not have the attitude of understanding. This distinction between outcomes of learning that inhere in the self and those which lie outside of it is in many respects similar to the distinction which Morrison makes between what he calls adaptive change and adaptive response. 36 A n adaptive response, as he sees it, is an adjustment which the individual makes, even though he completely lacks the change in personality of which the response is indicative. On the other hand, an adaptive change has reference to accretions to the self. In this connection, Morrison has attempted to show that most of the outcomes of learning acquired in school are merely adaptive responses. 37 Morrison says, however, that the Morrison, Basic Principles in Education, pp. 107 f. Morrison, Practice of Teaching in the Secondary School, Chapter I V . " S o far as my own studies and those of my students go, I think that the proportion of pupils w h o consistently exhibit true learning of school subjects in the form of personal adaptations probably runs at not much above ten per cent, and that is not very f a r from chance expectation." Basic Principles in Education, p. 110, by permission of Houghton Mifflin, publishers. Morrison does not present the evidence in support of this conclusion, but it can hardly be supposed that many outcomes of learning in school are really adaptive changes, to use his terminology. 36 37

ii8

Performance and V a l i d i t y

"self cannot be understood as inhering in the organism in either its physical or its psychical aspect, nor can personal learning nor personality." Then he continues, "These are realities, in fact they constitute the fundamental reality of existence, but they are not material." 38 In taking this position, he seems to have strayed further from the pathways of naturalism than the facts necessitate. It may be argued that the distinction which has been made between outcomes of learning is after all merely theoretical, and is in direct conflict with established facts. For has it not been shown experimentally that there is a high correlation between what is called learning information and the so-called higher mental processes such as problem solving? Now, of course, the distinction which we have made among outcomes of learning does not lie along the line of cleavage traditionally drawn between the acquisition of mere information and the higher processes. Rather, the distinction admits that either mere information or the outcomes of learning that involve the higher processes may be acquired as an element of the self, or also as performance based upon underlying events that are outside of that structure. Inasmuch as this challenge from recognized experiments lurks in the foreground, however, we shall now see what there is to these established facts. Not all of the pertinent studies can be reviewed here. We shall therefore select one of the most recent studies of the leading exponent of the theory that all outcomes of learning are basically the same. In a recent study of Thorndike's upon which he reports in his Human Learning, he undertakes to determine the relationship between what he calls associative learning and ability to deal with abstract qualities and relations. According to his interpretation, there is a high relationship between these two processes, and he concludes that "different though they may seem, [they] are intimately related in 3 8 Morrison, Basic Principles publishers.

in Education,

p. 154, by permission of the

Performance and V a l i d i t y

119

mental dynamics and presumably depend upon a common cause." 39 Even a cursory examination of the materials of the tests used by Thorndike will reveal that responses to the items in the information test are not, for the most part, a result of simple recall, but of reasoning. Let us consider, for example, two items from the associative learning test. "One pound is how many ounces?" is one of the items, and " H o w much is 125% of 1 6 ? " is another item. Now, a correct response to the first item may be, and no doubt almost always is, a matter of recall. But a correct reply to the second question will seldom, if ever, involve merely a process of simple recollection. Out of the total items reported by Thorndike for a part of his associative learning test there are six that require simple recall. The remaining twenty-two items involve the same kind of processes as do the items of his reasoning test. Whatever his experiment may prove, it does not prove that all outcomes of learning are fundamentally the same. It apparently shows that there is a high correlation between two successive efforts at problem solving, since the content of both of his tests involves this process. In striking contrast to the experiment of Thorndike is the work of Tyler on the relation between recall and the higher mental processes. 40 Tyler's study shows that there is a low relationship between facility in mere recollection and facility in the processes of applying principles and drawing inferences. IMPLICATIONS FOR VALIDITY OF INSTRUMENTS

W e have now distinguished among three different events that may underlie the same overt performance. The first of these are the events ordinarily considered as physiological. The second are those which lie within the self. A n d the third are those which lie within the range of consciousness but are 39 Thorndike, Human Learning, p. 177, by permission of D. AppletonCentury, publishers. 40 Judd, Education as Cultivation of the Higher Mental Processes, Chapter II.

120

Performance and V a l i d i t y

outside of the self, or at most on its surface. It is perhaps true, as has already been observed, that the first of these events functions very little, if any, in responses to items of an instrument. The second and third, however, underlie almost all responses. T w o students may make identical overt responses to items of an instrument in the field of history, for example, and yet have quite different outcomes of learning, when considered in terms of their internal status. A n d the internal status is as significant as the overt performance. For that status is not only a factor in the personal stability of the individual, but also in the continuous availability and adaptability of the performance to new activities and situations. W i t h these types of outcomes of learning before us, we can now state explicitly what has been more or less implicit in our whole discussion. The validity of instruments, as we have observed, rests upon the assumption that overt responses, as manifested under the conditions of control imposed by an instrument, are fundamentally the same. Now, if the position advanced in this chapter is correct, the complexity of the human organism makes it possible for overt responses apparently alike to be correlated with disparate underlying events. Even a cursory examination of instruments of measurement, especially in content subjects, will reveal that they do not discriminate among performances in terms of underlying processes. Educational measurement has explored the local and easily accessible haunts, venturing not into the deeper recesses of its territory. It has thus failed to make allowance for qualitative distinctions not explicitly present in performance. W e thus reach the conclusion that the validity of instruments of measurement is not established, when performance is considered in its deeper and more fundamental nature. The status of educational measurement with respect to validity of its instruments, especially in the content fields, may be described as follows : Let it be admitted that a transitive asymmetrical order of individuals, in respect to some per-

Performance and Validity

121

formance on an instrument, is established. Then, let A,B,C, D, η represent an order of the individuals such that A is higher than Β, Β is higher than C, C is higher than D, and so on. Now, this order is entirely in terms of overt performance, for we do not know the underlying events, and the operations of measurement are not designed to discriminate among them. However, let us assume that the instrument is valid in terms of the underlying events that involve the self. This means that the responses of the individuals to items are backed up, so to speak, by events within that structure. These events we shall label P. Then, if the order which has been established is valid, it may be expressed in the following manner : ABCDEFG P P P P P P P P P P P P P P

η η

where Ρ represents the underlying events of the larger portion of the responses of the individuals, if not all of them. But to repeat what has already been said, there is nothing in the manipulations by which the order is established to assure the presence of the underlying quality, P. On the contrary, the discussion of the foregoing section makes it quite plausible, if indeed not tenable, that for a given performance the underlying quality may not be consistently present. If we let those outcomes of learning that lie on the surface of or outside of the self be represented by R, and those that might conceivably be considered as merely physiological be labeled T, the actual status of the validity of the order may be represented as follows : ABCDEFG RPPRRRPPR

η T..η

where R, P, and Τ represent the underlying events of the responses of the individuals. The order of individuals in respect to performance on an in-

122

Performance and V a l i d i t y

strument m a y thus have a spurious foundation, because a series of scores w h i c h is taken to represent increasing degrees of a continuous quality, that is, a particular kind of learning, does not in actuality do so. T h e r e is, of course, nothing here which precludes an instrument h a v i n g a high degree of reliability, that is, consistency of repeated measures. A g a i n , the instrument m a y be pertinent to a given subject matter, as determined by those competent to j u d g e . It would still be without validity, however, because the responses to its items would not be constantly and uniquely related to specific events which it purported to represent. Furthermore, the selection of test items by the criterion of discriminative capacity does not recognize the distinction which is here d r a w n a m o n g inner events. O n the contrary, such a selection is based upon a surface analysis of behavior. 4 1 O n this basis, instruments of educational measurement are not dependable, f o r they fail to tell us which of the underlying events the performance indicates, or, f o r that matter, to distinguish a m o n g them at all. T h i s conclusion seems to be especially true in respect to instruments designed f o r content subjects, but whether it will hold to the same extent f o r instruments designed to measure skills is doubtful. B y w a y of conclusion, it must be pointed out that educational measurement cannot remain content with a surface analysis of behavior. S u c h an analysis w a s justified in the early days of measurement, but now a more fundamental exploration is demanded. O u r discussion leads us to conclude that the relation between performance and the objects of measurement must be experimentally explored. It will then be recognized that the data and subject matter of educational measurement are not necessarily identical. F r o m these observations, the whole problem of the techniques of control 4 1 Lindquist's efforts to devise tests that rule out rote learning is a departure w o r t h y of note. But even in this case there is no indication that the deeper nature of learning is recognized. See H a w k e s , Lindquist, and Mann, The Construction and Use of Achievement Examinations, pp. 81 ff.

Performance and V a l i d i t y

123

will be raised anew, and it will become necessary to fabricate more fruitful assumptions on which to base our instruments. Before the relations which we now establish with our precision operations can gain wide acceptance and respect, it is necessary that events beyond our immediate command be brought within reach by the discovery of the connections which they have with things which we can manipulate. But this cannot be done, except through a thorough-going analysis. It is not a question of whether instruments can be validated in this more fundamental sense. There are no a priori reasons to suppose that they cannot be made dependable. W h a t is needed are more adequate formulations of the underlying events, upon which to base experimental operations. W h e n we have advanced far enough in our analysis of behavior, it will be discovered that the underlying events are distinguishable in terms of manifestations now slurred over in our gross analysis. Only after we have gone beneath the level of the complex and directly observable events, now described in numerical scores and interpreted mathematically, to the underlying events, shall we be able to understand what goes on and to exercise control over it. SUMMARY

Let us summarize the argument of this chapter. W e began by noting that the conditions of measurement demand that the objects which are measured be ordered in terms of some continuous property. Then it was pointed out that unless correct responses to items of a test be fundamentally the same, this condition of measurement is not satisfied, for the qualitative continuity is broken. In this event the instruments would be invalid. W i t h this point in mind, we explored behavior, and this exploration led us to conclude that responses to items are not necessarily of the same kind. W e saw that under certain conditions outcomes of learning become incorporated into a configuration of attitudes, that is,

124

Performance and Validity

the self. Some of these outcomes may be thought of as more centrally located in the configuration than others. That is, some lie closer to the more or less persistent life aims, purposes, needs, and activities of the individual. On the other hand, there are outcomes which are on the surface or outside of the self. These outcomes are more or less independent and isolated, being little affected by other outcomes. They are hence less permanent and tend to become less and less available in activities. W e also noted in passing that some outcomes, such as those acquired purely through a process of conditioning, lie below the level of consciousness and, therefore, outside of the self. From the standpoint of these three classes of outcomes of learning, the validity of our instruments is seriously in question, because they do not now distinguish among them.

VII

Addition and Performance

T

HE task which now confronts us is to analyze educational measurement from the standpoint of whether or not the conditions of addition have been satisfied. In making this analysis, we shall consider both developmental instruments and such scales as those designed to measure merit of composition and merit of handwriting. For over twenty-five years, leading authorities in educational measurement have claimed that quantitative units have been established. It is the thesis of this chapter and the next one that such claim is without adequate foundation. It will be shown that instrument construction has proceeded on the assumption that it is unnecessary for the additive property of learning to be experimentally demonstrated, and it will be further shown that assumptions and statistical manipulations have been substituted for an experimental confirmation of the axioms of addition. T H E P A T T E R N OF A C H I E V E M E N T

INSTRUMENTS

In considering the problem of addition, it is necessary to understand clearly the fundamental character of educational measurement. Its nature may be seen more clearly if we recall that there are two types of measurement. In the first type, only the axioms of order are experimentally confirmed. The measurement of temperature and density are illustrations of this type. In the second type, the axioms both of order and addition are experimentally satisfied. This type is illustrated by the quantitative evaluation of thermal capacity, time, mass, and illumination.

126

A d d i t i o n and Performance

In the second type, a property is measured in terms of itself. The length of an object, for example, is measured in terms of a like property of some other object. That is, we place some standard object, such as a yardstick, upon the object to be measured and we read off a certain number of units in terms of which length is described. Each unit in the yardstick is, so to speak, a length in the same sense as the measured object is a length. However, in the first type of measurement, the property is measured in terms of some other property, which is related to it by a known law. The measurement of temperature, for example, is indirect. The temperature itself is not measured, but rather the linear expansion of a thread of mercury is measured. This evaluation of the thread of mercury is then taken as a measure of temperature, because it has been found that an increase of the temperature of bodies is correlated with the expansion of a confined thread of mercury, when the atmospheric pressure is held at a constant. In other words, in this instance we measure a function of the property which we call temperature. A n d a unit of the thread of mercury is not a unit of temperature per se, but a unit of a property related to it by a known law. T h e derivation of units from a function of a property, instead of directly from the property itself, makes it exceedingly difficult, if not impossible, to establish equal units in any sense save by definition. For the problem of determining the structure of a property from its function is an almost, if not entirely, insurmountable difficulty. The units of the function may be equal, but this is no guarantee that they represent equal units of the property itself. It is generally recognized, for example, that equal periods of time cannot be taken as equal units of growth, albeit growth and time are somehow related. 1 Again, the units on a temperature scale are equal in a linear sense. But in terms of temperature the difference be1 F o r a stimulating discussion of this point, see D u Noiiy, Time, pp. 147 ff.

Biological

Addition and Performance

127

tween 99 and 100 on the scale is not equal to the difference between 50 and 5 1 . The pattern of educational measurement, as we have pointed out above, is essentially the same as that described in the measurement of temperature. The outcome of the learning process is not measured directly. Rather, we take the products of performance as an index of the outcome, in much the same sense as the expansion of a thread of mercury is taken as an index of temperature. As was shown in the last chapter, however, the relation between performance and the events which underlie it is not so clearly defined as the relation between temperature and the thread of mercury. Nevertheless, we shall proceed in the present discussion as though the relation were well established. Now the number of problems which an individual can solve correctly, the number of items to which he can respond correctly, and so on, are not amounts of the outcome of learning, but a certain quantity of observable products which are thought to be related to the outcome in some particular manner. The numbers used to represent these products, therefore, do not represent parts or units of the outcome per se. If they did, educational measurement would have the same pattern as the measurement of length. If so, it would be possible to measure the results of learning in history, for example, by selecting some product of performance and experimentally demonstrating that other products could be described in multiples of this product or some unit of it. This has not, of course, been done, although something like it has been attempted, as we shall see in the next chapter, in the construction of instruments designed on the psychophysical pattern. Failure to recognize that the products of learning are not now additive is a source of confusion in educational measurement. If one person solves ten problems of an algebra test, and another person does only five of the problems, we cannot conclude that the first individual has five more unit quantities of

128

A d d i t i o n and Performance

the something called learning than the second person. This should be obvious to almost anyone, but we frequently find just such confusion as this in educational measurement. For example, in a recent discussion of the relative values of the new and old-type examinations, Ruch and Talbott ask, When a pupil is asked to "Discuss fully the Monroe Doctrine" what fraction of his knowledge is elicited ? Is it all, nine-tenths, three-fourths, one-half, one-third, or one-tenth elicited ? 2 In answer to their own question, the writers say, " O n the average the essay question calls forth less than half of the pupil's knowledge; two-fifths being a closer estimate." 3 T h i s statement can have no definite meaning, apart from two assumptions : first, that the total of one's achievement in respect to the Monroe Doctrine can be regarded as an algebraic sum of elements ; and second, that an element of subject matter also defines an element of the outcome of learning. There is no adequate experimental support for these assumptions. Ruch and Talbott simply sampled elements of the subject matter which the essay and objective tests embrace, and concluded from their observations that the former elicits less than half of a pupil's learning. There is, of course, no adequate definition of an element of subject matter, to say nothing of an element of the outcome of learning. "Columbus discovered America in 1492" may be considered an element. But Columbus is also an element, so is America, so is 1492. Even if the elements of the subject matter were well-defined and their relation to the outcome of learning were known, there is still no process of adding them. Until there is an experimental proof of the above assumptions and until the elements are made to comply with the axioms of addition, statements such 2 T a l b o t t and R u c h , " M i n o r Studies in O b j e c t i v e E x a m i n a t i o n Methods," p. 203. 3

Ibid., p. 206.

A d d i t i o n and Performance

129

as that of Ruch and Talbott can have only a figurative meaning. A more subtle form of the same idea is expressed in a definition of an instrument designed to measure general achievement. Lindquist and Anderson say, " A general achievement test is one designed to express, in terms of a single score, a pupil's relative achievement in a general field of subject matter." 4 Later, in the same discourse, these investigators, discussing the multiplicity of elements in a subject-matter field, advance the following theoretical position : " I f each of these elements could be given a weight proportional to its importance or value, then the pupil's total or 'general achievement' would be measured by the weighted sum of such elements that he has learned or understood." 5 These two passages imply that a score is a sum of separate units, each unit being defined by a single element. This is true, of course, but from the standpoint of the learner's growth each element is assumed to be an isolated and distinct outcome. Unless the outcomes of learning that correspond to the elements stand apart and maintain their identity in the learner, the summed total of one's achievement cannot be obtained by computing responses to the elements. In the absence of experimental proof that the outcomes are additive, let us consider the matter theoretically. Suppose we have two outcomes, A and B. Suppose also that by some means we discover that A = B. Suppose that some outcome X is added to A . Can the result be expressed as A + X > Β ? It does not seem likely, because the result may be qualitatively different from A and B. Indeed, recent psychology seems to indicate just such a conclusion. Speaking in chemical terms, it implies that outcomes are compounds, and not mixtures. 4 Lindquist and Anderson, "Achievement Tests in the Social Studies," p. 201. 5 Ibid., p. 210.

130

A d d i t i o n and Performance

Every new adjustment [says Coghill] arises out of the total of the individual's experience. Or in terms of "Gestalt" the particular adjustment is related to the total experience as a quality upon a ground.® Lashley points out that his experimental studies "have emphasized the unitary character of every habit." 7 T h e same idea is implied in the Gestalt principle of learning through insight. 8 A n outcome of learning is something new—different from the factors that compose it. Moreover, outcomes of learning can hardly be considered as independent units. Although the conduct of an individual at any moment may be dominated by some particular outcome, it cannot be completely disconnected from other outcomes. The outcomes are interrelated so that the acquisition of a new one may modify other outcomes by changing the relations which exist among them. In a wider sense, the acquisition of an outcome may involve the entire organism in one way or another. Each new outcome, if it really enters into the self, in one way or another enters into and affects the organism in its relation to the past, the present, and the future. It becomes organized in experience so that it acquires meaning and significance in terms of those outcomes that make up the self. It is obvious that this view of learning denies the quantitative assumption upon which the present instruments are based. Clearly it is nonsense to think in terms of ratios and proportions, where each new outcome reconstructs to some extent that which went before. T h e commutative and associative axioms of addition would be seriously violated. Apparently one possible way of escaping the difficulties raised in the foregoing paragraphs lies in digging down to 6 Coghill, Anatomy and the Problem of Behaviour, p. 108, condensed, by permission of The Macmillan Company. 7 Lashley, Brain Mechanism and Intelligence, p. 14, by permission of The University of Chicago Press, publishers. Italics mine. 8 Koffka, Growth of the Mind: "Learning is true development, and not a mechanical addition of performance" (p. 227, by permission of Harcourt, Brace, publishers).

Addition and Performance

131

some neutral foundation, such as that suggested in K o f f k a ' s theory of traces, in terms of which all outcomes can be expressed. But the adoption of such a common denominator might necessitate an entirely new approach to the whole problem of educational measurement. A n instrument for the detection and measurement of traces, for example, might be as different from our present instruments as the methods of measuring waves is removed from the ordinary process of distinguishing among colors. For the present we can escape much confusion if we keep in mind that addition is an experimental concept and that its axioms cannot be satisfied by merely assuming an additive structure. "Summed total achievement," "proportional achievement," "fraction of knowledge," and the like, can have well-defined meanings only when elements are vouched for by an adequate criterion and are shown by experimental operations to possess an additive structure. T H E QUEST FOR EQUAL

UNITS

One of the most general sources of confusion in educational measurement is the failure to recognize that additive units can be derived only by a physical demonstration of addition. A s has been shown before, a transitive asymmetrical order, generated in terms of performance, gives no ground for adding the products of performance or the numbers used to represent them. A transitive asymmetrical order simply yields a scale of relative position. A n d one of the big problems with which we have dealt in measurement is the transformation of scales of relative position into quantitative scales, graduated in units of amount. In attempting to establish a transitive asymmetrical order, we have followed experimental procedures. But in meeting the conditions of addition, we have abandoned experimental methods and consequently have confused units equal by definition with additive units experimentally ascertained.

132

A d d i t i o n and Performance

This departure from experimental procedure may be seen clearly if we call to mind the experimental character of addition, before describing the way in which equal units have been sought in educational measurement. Let us again consider the measurement of mass. It is not assumed that mass is additive; its additive character is experimentally demonstrated. It will be remembered that in the measurement of mass a transitive asymmetrical order is attained, by means of a balance constructed to conform to certain conditions. T h e order of objects thus attained is simply a description of their relative position in respect to weight. A f t e r the scale of relative position is established experimentally, however, operations are then devised to exhibit the additive character of mass. For example, two objects, each of which will just balance the other when placed in pans at the ends of the beam, are placed in one pan together, and some third object which will just balance these two objects is placed in the other pan. W h e n this has been done, it is permissible to say that the third object is twice as heavy as either of the first two bodies. These operations, together with others, confirm the axioms of addition. In educational measurement, however, the jump from a transitive asymmetrical order to a scale of equal units has not been effected by experimental procedures. There have been no experimental operations to confirm the axioms of addition as is done in such fundamental measurements as, for example, those of thermal capacity, electrical resistance, illumination, time, and so on. On the contrary, the whole procedure of reducing a scale of relative position to one of units of amount consists of assumptions and statistical manipulations. T h e assumptions underlying the search for quantitative units are that learning is normally distributed 9 and that equal segments 9 It is not our purpose to examine the assumption of normality. It has already been extensively treated by Boring, Wechsler, and Kelley. The point of our argument is that even if the qualities involved in educational measurement are normally distributed, it still does not permit us to establisl units that are truly equal. F o r the benefit of the reader w h o is interested ir

A d d i t i o n and Performance

133

of the base line of a normal curve mark off equal units of learning. These assumptions, as was noted in an earlier chapter, were inherited from Galton, and have guided most of the efforts to derive quantitative units in both psychological and educational measurement. A s we shall see, they do not guarantee equal units, and when used for that purpose, are fundamentally untenable and misleading. Essentially two distinct methods are followed in deriving units based upon the assumptions stated in the foregoing paragraph. The first is the scaling of raw scores, derived from an instrument. T h e T-scale, formulated by McCall, is a good the assumption of normality, w e have included a f e w comments taken f r o m Boring and Wechsler. T h e assumption of normality has been treated at length by Boring. One of his main points is that since w e do not know the units of mental qualities, it is impossible to establish the assumption of normality, because the shape of the curve is a function of the units involved. See Boring, " T h e Logic of the Normal L a w of E r r o r in Mental Measurement." More recently W e c h s ler has made the same point. H e says, for example, that "in measuring the speed of perception, one finds that the distribution of scores is different when the subject's performance is measured in terms of total time required to complete a given task from what it is when its excellence is measured in terms of number of items perceived per unit of time." See Wechsler, The Range of Human Capacities, p. 35, by permission of W i l l i a m s and W i l kins, publishers. Wechsler's extensive review of the studies of human capacities led him to conclude that mental qualities are not normally distributed. H i s conclusions deserve an extended quotation. "Examination of actually fitted curves and frequency tables f r o m which the general character of the distribution is definitely indicated shows that the distribution of human traits and abilities conform not to one but to several types of curves. In point of fact, the only human distributions which are truly Gaussian are those which pertain to the linear measurements of man, such as stature, lengths of e x tremities, the various diameters of the skull, and certain of their ratios like the cephalic index, etc. But even among these there is a considerable deviation from true symmetry. In the case of most other physical and physiological functions, this deviation from the 'normal' type, or skewness, is sufficiently great to call for another type of curve altogether. A l l that can be said is that occasionally one does come across a series of measurements of these functions which do roughly conform to or approximate a Gaussian distribution. But one must hasten to add that such approximation is practically never met with in the case of the distribution of mental abilities. " T h e assertion that mental abilities do not distribute themselves according to the normal curve is contrary to the claim made for them in nearly all text-books of psychology where the question is discussed. T h e fact

134

Addition and Performance

example of this method. 10 Stripped of its technical aspects, the procedure simply consists of transforming raw scores, such as the number of items correctly answered, into measures of deviation from the average of the scores of a group. The deviation is expressed in terms of some statistical measure of variation from an average, usually sigma, the probable error, or, as in the case of the T-scale, a fraction of these units. Thus learning is no longer described in terms of the number of tasks an individual can do correctly, nor the number of problems he can solve, nor the number of items he can answer, but in terms of some derived measure, expressed as a degree of variability. An illustration of the second method may be found in Woody's arithmetic scale, 11 or in Trabue's language scales.12 In the construction of Woody's scale, for example, a transitive asymmetrical order of tasks in arithmetic was generated in terms of difficulty. Difficulty is operationally defined by the percentage of a group responding correctly to a task. Thus if a task is solved by go per cent of a group, it is less difficult than a task to which only 70 per cent respond correctly, and a task solved by only 50 per cent would be still more difficult, remains that one finds precious few instances in the literature where frequency curves were actually fitted to mental data, and in none of these, so far as I was able to discover, was it shown that the best fitting curve was in fact the Gaussian type. In a number of instances the simple frequency polygons given, do seem to indicate that if a curve were fitted, it would probably turn out to be Gaussian in form, but unfortunately these are precisely the cases where the method of measurement of the abilities involved is open to serious criticism. Such, for instance, is the case of many intelligence and educational 'scales' where the practice has been to 'weight' or re-evaluate the original test scores on the basis of their statistical frequencies. Naturally, in those instances, where the original test scores are redistributed, they cannot but help give a form of distribution which the statistical artifacts employed, themselves served to produce." The Range of Human Capacities, pp. 32-35, condensed, by permission of the publishers. For an opposing argument see Kelley, " T h e Principles and Techniques of Mental Measurement," pp. 408 fï. 10 11 12

McCall, How to Experiment in Education, pp. 94 ff. Woody, Measurements of Some Achievements in Arithmetic. Trabue, Completion-Test Language Scales.

Addition and Performance

135

and so on. Using this definition of difficulty, it is possible to mold experimental operations so as to establish a series of tasks of increasing difficulty. But after the tasks have been ordered, the problem of separating them by equal amounts of difficulty arises. This problem is solved by recourse to statistical manipulations. Essentially these manipulations consist in locating the point on the base line of the normal curve at which a given task will fall, as determined by the percentage of correct solutions given. For example, let it be assumed that tasks A , B, C, D, E are selected from among those comprising the transitive asymmetrical order, and that the percentage of correct responses to each task is 2.28, 6.68, 15.87, 30.85, and 50 respectively. According to the criterion of difficulty, A is more difficult than Β, Β more than C, C more than D, and D more than E. T h e character of the normal curve makes it relatively easy for these percentages to be converted into units of the base line of the curve. B y reference to statistical tables, it can be computed that these percentages correspond to 2, 1.5, I, 0.5, and o sigmas respectively. Since the percentage of correct responses locate these tasks on the base line at intervals of 0.5 sigma, it is concluded that they are tasks separated from each other by unit amounts of difficulty. In the language of Garrett, if we can assume a normal or approximately normal distribution in the capacity tested—the unit of measurement is taken as the σ or the P.E. By so doing we are able not only to arrange the test items in a simple order of difficulty, but to "set" or space them at definite points along a scale of difficulty—along the baseline of the normal curve. On such a scale the distance from one item to another, or from any given item to the selected zero point is known as definitely as the distance between two divisions on a foot rule.13 With these procedures of obtaining quantitative units before us, we shall now turn to a more critical consideration of them. 1 3 Garrett, Statistics in Psychology and Education, mission of Longmans, Green, publishers.

pp. 101-2, by per-

136 A

A d d i t i o n and Performance CRITIQUE

OF T H E

PROCEDURES

OF

OBTAINING

UNITS

In opening our critical discussion, it is well to recur to the nature of mathematics. If the relation of mathematics to the objective world is clearly understood and kept in mind, many difficulties and intellectual absurdities may be avoided. Formal mathematics depends upon logical consistency; it deals with axioms and their necessary implications ; it is not concerned with describing aspects of the objective world. If mathematics is applicable to objective things, it is because these have been shown experimentally to possess the systematic character necessary for mathematical treatment. The objective world cannot be forced by decree into a mathematical design, save at the risk of serious error and intellectual vagary. In short, mathematics, as someone has said, is a kind of hopper out of which comes no more than is put into it, but if the materials thrown into it are consonant to its processes, they come out in a much more usable form. Measurement, as we have observed, is one of the fundamental procedures by which mathematics is found to be applicable to aspects of nature. A s discussed above, the construction of measuring instruments resolves itself into a problem of discovering whether or not certain properties have a structure which will permit the application of mathematical operations. W e are concerned here with certain aspects of the normal probability curve, since that is the form into which the data of educational measurement are cast. The probability curve is not dependent upon whether there is some particular property of the objective world to which it is applicable. But it depends upon whether certain conclusions necessarily follow from assumptions which are made. Hence the characteristics of the normal curve are known to be true of properties only by experimental procedures. In the preceding section it was seen that our procedure of obtaining equal units is based upon the normal curve of prob-

Addition and Performance

137

ability. After the conditions of order are experimentally confirmed, the instrument-maker then attempts to obtain quantitative units, that is, units equal in amount, by recourse to certain characteristics of the normal curve. In doing this, the base line of the curve is taken to represent intensive qualities like learning. While it is not possible to graduate the base line directly, it is possible to do so indirectly, because the characteristics of the curve make it convenient to discover the percentages of frequencies, whose ordinates mark off equal units on the base line. Hence, so the argument runs, if we do not know what the units of a given quality are, we may obtain them if we know the frequencies against which the quality is plotted. It is obvious that the frequencies are experimentally derived. The jump from the frequencies to the units, however, is an inference and not an experimental operation. And we have no assurance that the quality embraced by the curve is compatible with the inference. Of course, one of the characteristics of the curve is that one unit of the base line is equal to any other such unit. This much is geometrically true, regardless of whether qualities such as learning have an identical structure in this respect. This is a formal characteristic, in the sense that it is not uniquely true of any property of nature, and consequently it may be descriptive of a large variety of phenomena. But to demonstrate whether this is true of a particular quality depends ultimately upon experimental work and not upon assumptions and logical deductions. The point may be clarified by recourse to an illustration. Let us consider a normal distribution of people with respect to height. It is true that equal units of the base line of the curve will mark off equal units of this quality. For exmple, if the mean height is 60 inches, and sigma 1 is 5 inches, then it is true that the individuals whose frequencies place them at sigma ι will be 65 inches or 55 inches high, depending upon whether the sigma is plus or minus. It is also true that the

138

A d d i t i o n and Performance

frequencies may be manipulated to obtain units of height, just as they are used to obtain units of learning. But quantitative units of height are not established by this procedure. Rather, sigma ι has a quantitative meaning because height has been shown in quite another context to be additive. Apart from the fact that units of length had already been ascertained on operational grounds, there would be no reason to assert that the sigma marked off a unit of height in a fundamental and additive sense. It is the fact that height has been measured fundamentally, and independently of its career in a normal distribution, that gives quantitative meaning to a segment of the base line. A n d it is just the absence of such fundamental measurement of learning that renders fruitless the attempts to deduce quantitative units from the characteristics of the normal curve. If closer attention had been given to the fact that two systems similar in part are not necessarily identical throughout, the idea of deriving quantitative units from the normal curve would not have commanded such general respect. It is not necessarily true that if a quality is distributed normally, it must also be consonant to all other characteristics of the normal probability curve. This is no more necessary than that a quality which satisfies the axioms of order and equality must also be identical with the relations expressed in the axioms of addition. W e know that the latter is not true, because length and temperature each can be made to satisfy the conditions of order and equality, but only length will satisfy the conditions of addition. T H E TYPES OF

UNITS

In the effort to reduce learning to a quantitative scale, much misunderstanding has arisen in regard to the meaning of units. This misunderstanding has resulted largely from the failure to recognize that units can be classified into two distinct types. It is fairly evident that in educational literature little or no distinction is made between units that are equal by definition

A d d i t i o n and Performance

139

and units that are equal by experimental demonstration. This confusion is found, for example, in Woody's 14 suggestion that the units on his arithmetic scale are equal in the same sense as the units on a foot rule. It should be noted in passing that most of the efforts to obtain equal units of intelligence have been based upon the same lack of attention to the distinction in kinds of units. 15 There are two kinds of units: first, units that are equal by definition; and, second, units that are equal by virtue of the fact that the axioms of addition have been confirmed. The former is the kind that has been evolved in the field of education. Units equal by definition are those which have been determined for convenience, but which carry no experimental assurance that the quality which lies behind them is submissive to the same treatment as the numbers designating them. The units of an ordinary thermometer, for example, are equal only by definition, and the change from 20 to 30 degrees is not the same as a change from 10 to 20 degrees. This much is clearly evident in the procedure by which ordinary thermometers are constructed. A bit of mercury is secured in a closed tube, in which atmospheric pressure has been reduced to a minimum. T h e zero point, as on the centigrade scale, is fixed by marking the tube at the point to which the mercury contracts when placed in a mixture of water and ice about as cold as melting ice. The boiling point is fixed by marking the tube at the point to which the mercury expands when placed in boiling water. O f course these operations should be performed under standard atmospheric pressure. W h e n these two points have been determined, the scale is divided into 100 equal parts, each part representing a degree. But the parts are equal only in a linear sense and not in the sense of equal portions of the quality called temperature. For there is nothing Woody, Measurements of Some Achievements in Arithmetic, p. 30. Thorndike, in his Measurement of Intelligence, either ignored the distinction or failed to recognize that his method was simply a more rigorous definition of units. 14 15

140

A d d i t i o n and Performance

in these operations that makes temperature answerable to the axioms of addition. If the procedures of obtaining units in education, as well as psychology, are examined, they will be found similar to that followed in making a thermometer, in that they are only ways of more or less rigidly defining units. Units of the second type are those capable of exhibiting the same kind of relationship as the numbers designating them. Units of length, mass, time, electrical resistance, volume, monochromatic illumination, and so on, are examples of this type. In these cases we know from experimental operations that the ratio of the units is the same as the ratio of the numbers representing them. 16 T h e arbitrary character of units is another source of misunderstanding and confusion. The size of the units ultimately selected is an arbitrary matter, convenience being the chief determining factor. That is, whether an inch, a centimeter, or a yard is used as a unit of length is a matter of choice, and for convenience of calculation, and so on, one may be considered as preferable to the others. But this arbitrary aspect should not be taken to mean, as it sometimes has been, that all units are conventional. For the relationship expressed in units of electrical resistance and mass, for example, is a necessity that transcends convention, choice, and arbitrariness. When a unit of volume, for example, has been selected and a number assigned to it by proper experimental procedures, all other numbers in the scale have been determined by a necessity that cannot be called human. It is just the failure to satisfy the axioms of addition that makes units in educational and psychological measurement conventional. Correspondingly, it is the successful fulfillment of these axioms that places units such as those of volume, length, and time beyond convention. It should not be understood, however, that conventional units are of no value. Units fixed by convention are of great 16

See Campbell, Physics, the Elements.

A d d i t i o n and Performance

141

value, for they permit us to do things which could not otherwise be done. If we agree conventionally to take units of an extensive quality, like length, as if they were units of some intensive quality, like learning, with which they correlate, we can compare individuals in respect to such and such a quality and establish relations among phenomena. This fact, however, should not be taken to mean, as it sometimes has been, that all units are conventional or that units quantitatively equal are unnecessary. Unless there were fundamental units, such as those of time and length, it would not be possible to derive the units upon which we agree conventionally. In the final analysis, all units whose equality depends upon convention go back to the units of some magnitude fundamentally measurable. Therefore educational measurement is not to be condemned because its units are conventional, so long as these units are used with this in mind. Moreover, it should not be assumed that the foregoing discussion implies that educational measurement must derive units by the same procedure as that followed in the measurement of mass, length, electrical resistance, and so on. Every subject matter has its own peculiarities and limitations, and techniques and procedures must be adapted to its nature. Nevertheless, it is implied that if we are to obtain units really equal in educational measurement, there must be some system of physical operations by which the axioms of addition are satisfied. While we should not follow procedures developed in some other subject matter, it is felt that a careful study of the confirmation of the axioms of addition, in such qualities as illumination, 17 would prove suggestive in our confused quest for equal units. It may be that some of the qualities dealt with in educational measurement will turn out to be capable of expression in really equal units. If this occurs, no one can predict what the particular procedure of obtaining them will be. But it can 17

Campbell and Dudding, "Measurement of Light."

142

A d d i t i o n and Performance

be asserted that whatever it is, it will be experimental and will publicly demonstrate that such qualities have a structure consonant to that of the axioms of addition. For in the process of satisfying the conditions of addition, equal units are derived. Failure to recognize this point has led educational and psychological measurement astray in the quest for truly equal units. 18 In these fields it has been assumed that addition is made possible by the derivation of equal units, whereas equal units are established by addition. A s a result, units equal only by definition have been advanced as though they were equal in a quantitative sense. W h e n we have changed our guiding question from " H o w to find equal units ?" to " H o w can we satisfy experimentally the axioms of addition ?" we shall have cleared up much of our confusion, and, what is of even more importance, opened the way for the formulation of experimental operations in keeping with the end we seek. Until such operations have been made a matter of history, the contention that truly equal units have been established in educational measurement must remain an empty affirmation. SUMMARY

W e have been concerned with the contention that the conditions of addition have been satisfied in educational measurement. This contention is implied in the claim that equal units have been established. In considering this contention, it was first pointed out that the responses of an individual are not necessarily additive, even though some of the conceptions and practices of educational measurement are based upon the theory that they are. They are numerative, not additive. W e saw, however, that the more careful workers, aware of this fact, have sought to establish an additive structure by statistical manipulations, whereby equal units on the base line of the normal curve are taken as equal units of the quality em1 8 This is true in all of the graphic methods of obtaining equal units in psychological and educational measurement.

Addition and Performance

143

braced by the curve. We then saw that this practice leaves out of account the fact that addition is an experimental concept, and that equal units can be determined only through the process of addition itself. The only way in which we can know that units are equal is by an actual process of addition. Units derived merely by logical and mathematical analysis can never be more than conventional, irrespective of how neat and complete the analysis may be. The present search for equal units is therefore condemned at the outset, because its reliance on logical and mathematical analysis condemns its findings to experimental verification. There is, therefore, no hope of discovering units that are really equal until we abandon the present approach and begin to search for the means of operationally interpreting the axioms of addition. Since the search for an additive structure, however, can be successfully consummated only after conditions of order and equality have been satisfied, addition is not an urgent problem at the present stage of educational measurement. Just now the crucial problem, as was seen in the last chapter, is the operational interpretation of the conditions of order and equality, in terms of some more or less continuous kind of learning. When that problem has been solved, it will then be time to turn to the question of addition.

Vili

Addition and Quality Scales

I

N THE last chapter it was suggested that an attempt had been made to compare products of performance and to express them in terms of multiples of some common quality. It is now our task to consider scales based upon a direct comparison of products and to examine the claim that these scales are graduated in units of equal amounts. In this chapter we shall therefore turn to a consideration of quality scales, such as, for example, those designed to measure merit of handwriting and merit of composition. T H E T H E O R Y OF Q U A L I T Y

SCALES

In the second chapter it was pointed out that quality scales were derived from psychophysical considerations that date back to the measurement of sensations by Weber and Fechner. However, the connection of these scales with the work of Weber and Fechner is not direct and unbroken, for they are based upon conclusions derived through a criticism of the work of these two men, rather than upon the principles which they formulated. In order to understand the principles upon which quality scales are based, it is necessary to recall certain developments in the field of psychophysics, especially the work of Fullerton and Cattell, On the Perception of Small Differences. Weber and Fechner based their work on the measurement of sensations upon the assumption that a just noticeable increase in a stimulus is a constant. For example, if some stimulus, such as light, be increased in intensity until the increase is just noticeable, that increase will appear the same to different ob-

A d d i t i o n and Q u a l i t y Scales

145

servers. It will also appear the same to an observer at different times. It was also held that when the objective amount of the light is continuously increased, the successive differences of brightness intensities, which are just noticeable, are amounts equal to one another. In time, however, it became evident that the just-noticeable difference was itself a variable, instead of a constant, so that a difference which is just recognizable may vary within the limits of complete certainty and complete doubt. 1 A n observer, instead of being able to recognize a just-noticeable difference as a sharp and well-defined point, may recognize it in all degrees of gradation from a point of absolute doubt to a point of absolute certainty. If an increase in the objective source of stimulation is so great, however, that an observer always correctly perceives it with certainty, there would be no way of determining the degree of the difference, because there is no way to ascertain by how much or little the increase exceeds a just-recognizable difference. I f , however, an increase in the stimulus is so small that state S i is not always distinguishable from state S 2 but that sometimes it appears greater, sometimes less, and at other times identical, we should have a condition wherein some judgments would be right and others wrong. This condition would permit a comparison of judgments and is essentially the method of right-and-wrong cases, out of which was developed the procedure used in the construction of quality scales. Each of these methods assumes a parallelism between increases in sensation and increases in the objective source of stimulation. Hence the increase in sensation is measured by referring to the magnitude of the objective correlate. In other words, a subjective psychological quality called sensation is measured by estimating the amount of the stimulus by means of certain effects which the stimulus produces in our conscious1

Fullerton, and Cattell, On the Perception

74 ff.

of Small Differences,

pp. 10 ff.,

146

A d d i t i o n and Q u a l i t y Scales

ness. If we specify that the effect must be just recognizable, we shall rely upon the method of just-noticeable difference. If, on the other hand, the effect must be one which cannot" always be recognized correctly and with certainty, then we shall have converted the method of just-noticeable difference into the method of right-and-wrong cases. The interpretation of these methods in terms of a parallelism was denied by Fullerton and Cattell. They held that the methods did not measure the quantity of sensation. 2 On the contrary, they believed that these methods simply determined the error of observation under varying situations, 3 and that the measured magnitude was the objective source of stimulation and not the subjective sensation. 4 In a word, the perceptions of an observer were simply judgments of the magnitude of an objective quality, and, like errors of observation in general, were normally distributed. This interpretation of psychophysical measurement suggested a new way of measuring objective qualities. For qualities which had before not been amenable to measurement could now be measured in terms of estimates of their magnitudes, by treating the estimates as errors of observation. T h e parallelistic interpretation, as was seen in the preceding paragraph, led to the conclusion that sensations were themselves measured by estimating the magnitude of their objective correlate. T h i s new interpretation simply held that the estimations of the magnitude of the objective correlate were errors of observation and not measures of sensations. According to this theory, we measure an objective quality by estimating it in terms of its effects on our own subjective processes, whereas before that we measured the subjective processes by estimating the objective quality in terms of the effects of that quality on our consciousness. 2 3 4

Fullerton and Cattell, On the Perception Fullerton and Cattell, On the Perception Fullerton and Cattell, On the Perception

of Small Differences, of Small Differences, of Small Differences,

pp. 23 ff. pp. 23 ff. p. 153.

A d d i t i o n and Q u a l i t y Scales

147

Let us consider the method of right-and-wrong cases, in the light of this new interpretation. Suppose we have two stimuli, M O and M ' O ' , qualitatively identical, but just different enough in intensity so that they cannot always be correctly distinguished. If an observer perceives the stimuli a number of times, it is assumed that his judgments of their magnitude will vary. If one observation of M O , for example, be made, a certain error will occur. If another observation be made, another error will occur. If we consider the average of the errors of a large number of observations as a point of reference, some errors will be larger than the average, some smaller, but the majority of them will cluster about the average. In short, for the stimulus M O we may assume a normal dispersion of errors of observation. W e may assume a similar distribution for the estimates of the stimulus M ' O ' . According to the theory, this condition permits us to determine the relative magnitudes of the two stimuli. This may be more clearly seen with the help of the following diagrammatic figure : V T V

M

ρ

0

η

In some of our observations M O will be seen as Mp, and M ' O ' will appear as M ' n ' ; other observations will show M O as My, and M ' O ' as M ' x ' , and so on. If a very large number of observations be made, normal probability curves may be constructed for each of the stimuli, in terms of the errors of

148

Addition and Quality Scales

observation, as shown in the accompanying diagram. Now, let us assume that M ' O ' appears to be greater than M O in 75 per cent of the observations. If we do this, the amount of the difference may be determined theoretically in the following manner. The 75 observations out of each 100 in which M'O' appears larger than M O lie in the area p'y's't'. The 25 per cent of the cases in which MO appears to be larger than M'O' lie in the area p't'x'. This is just what we should expect, if the difference between the two stimuli is p'O'. For according to the characteristics of the curve, if the difference of the stimuli is ρ Ό ' , the frequency with which M ' O ' appears larger than M O will be proportional to the number of times that it appears smaller, as the area of p'y's't' is to the area of p't'x', which is as 3 : ι . When one stimulus is observed to be larger than another in 75 per cent of the observations, therefore, it is concluded that the difference between them is one probable error unit. And if we add a third stimulus of the same kind, which is judged greater than M ' O ' 75 per cent of the time, it will be as much greater than M'O', as the latter is greater than MO. At this point, we reach one of the implications of the method of right-and-wrong cases, which Thorndike stated as the theory that differences noted equally often are equal, unless they are always noticed, or never observed at all. 5 Upon this version of the method of right-and-wrong cases, he built his handwriting scale, which has become the pattern of quality scales based upon psychophysical considerations. This method has been applied to the measurement of the merit of such products as handwriting, composition, drawing, sewing, and oriental rugs. Let us illustrate some of the procedures by which the method is applied in the measurement of such products. With much more care than can be shown here, a transitive asymmet5 Thurstone has shown that the theory is valid only when the discriminal dispersions are equal. This criticism makes the theory more rigorous; it does not condemn it. See Thurstone, "Equally Often Noticed Differences."

A d d i t i o n and Q u a l i t y Scales

149

rical arrangement of compositions, for example, is established by having a large number of judges rank the compositions in order of merit. At the present moment we shall raise no question as to the meaning of merit, but shall pass on to the procedure by which it is claimed that this transitive asymmetrical order is converted into a scale of units of equal amounts of merit. The percentage of cases in which one sample is judged better than another is calculated. From these calculations we are able to assign numbers to the compositions by specifying that the numbers must have the same order as that of the compositions, fixed in terms of the percentages. But the numbers ultimately assigned have no more significance than any other representation of the compositions. Indeed, the compositions might just as well be represented by A,B,C,D, and so on, so long as these symbols are assigned in the same order as the compositions themselves. In actual practice, however, no numbers are formally assigned to represent the compositions as members of a transitive asymmetrical order. In the quest for equal units of merit this operation is legitimately passed over. And an attempt is made to express the percentages of "better" judgments in terms of some common unit, for it is assumed that the percentages of "better" judgments do not directly measure the fineness of the differences among the compositions, since the judgments are supposed to be unequal errors of observation. Hence the percentages of "better" judgments are converted into some multiple of the base line of the normal probability curve, rendering them expressible in terms of a common unit. The units derived by this statistical operation are then thought to express the merit of compositions, in the same sense as inches express the length of a board. As Hillegas says of his composition scale, The values which are assigned to the various samples express their quality in the same sense, though not as accurately, as millimeters express the lengths of lines. Just as 13 mm. may be expressed by 18.3 cm. or 1.83 dm., so the 183 employed in this

150

A d d i t i o n and Q u a l i t y Scales

scale may be considered as 183 small units of quality, or as 18.3 units ten times as large, or as 1.83 units one hundred times as large. . . . A difference of two hundred in this scale is equal to twice a difference of one hundred taken in any part of the scale. Thus sample 94 with a value of 369 is a little more than twice as good an English composition as sample 595 with a value of 183.® From the foregoing discussion it is obvious that the claim that the units of quality scales are equal in the same sense as those of a foot rule rests upon two assumptions : first, that the judgments of differences may be treated as errors of observation; second, that equal units of the base line of a normal probability curve necessarily represent equal units of the quality represented by the curve. These are just the postulates that we are going to examine. AN

ANALYSIS

OF J U D G M E N T S

OF

MERIT

A s a matter of common practice, we judge one composition to be better than another, one specimen of handwriting to have more merit than another, and so on. But it is also true that such judgments will vary from one observation to another. A t one time composition A , for example, may be thought better than C, while at some other time it may be judged inferior to C. These are facts of long standing, that antedate the development of quality scales. In the construction of these scales, however, such commonplace facts were elevated to the plane of science by cloaking them in technical β H i l l e g a s , A Scale for the Measurement of Quality in English Composition by Young People, p. 13. T h o r n d i k e says, of the units of his d r a w i n g scale : " W h e n w e say that the difference between the merit of d r a w i n g f and the merit of d r a w i n g g (8.6-7.8) is a p p r o x i m a t e l y equal to the difference in merit between d r a w i n g i and d r a w i n g j ( 1 2 . 6 - 1 1 . 8 ) , w e i m p l y that the t w o differences in merit are a p p r o x i m a t e l y equally often j u d g e d c o r r e c t l y by competent persons—and nothing m o r e . " " T h e M e a s u r e m e n t of A c h i e v e m e n t in D r a w i n g , " p. 17. I f this is all that is meant, then his units are equal by definition, and there is nothing more to be said. B u t he is elusive on the point, and in other connections he speaks as t h o u g h the units w e r e really equal. Cf. ibid., p. 6, for example.

A d d i t i o n and Q u a l i t y Scales

151

terms and manipulating them statistically. It is a humorous note in the history of education that at the very time when subjective evaluation of learning was being assailed by students of measurement, this same method of evaluation should have been adopted by its critics. Since the variations of judgment have been cloaked in technical terms by treating them as errors of observation, it is desirable to know whether or not they are worthy of the cloak. W e have implied that they are not, and this position must now be supported. W e have before us a fact : our judgments of the quality or merit of a product vary. The problem is how to explain the variations of judgment. Shall we assume that they are due to an infinite number of independent factors, and that each variation is no more than an error of observation, consisting of the sum of a large number of smaller errors? Shall we assume that the variations are qualitatively identical, each one differing from the preceding one only in amount? Well, these were indeed the explanations accepted in the construction of the scales. 7 For, as we saw in the preceding section, the scales are based upon the idea that judgments can be treated as errors of observation, and this idea necessitates just such an explanation of the variation of judgment. O f course, one can say that in judging compositions with respect to merit, for example, one's judgments are just what they are, and because the judgments are distributed normally is no reason for calling them errors of observation. This position is true. But it is also true that if the judgments are not qualitatively identical, there is no possibility of measurement anyway, since measurement involves a qualitative continuum. W h e n judgments differ only quantitatively, that is, when they are qualitatively alike, they are errors of observation. I f , before we pass on to an examination of the nature of the variations of judgment, we examine the character of the experiments upon which Fullerton and Cattell based their ac7

Thorndike, "The Measurement of Achievement in Drawing," n., p. 22.

152

A d d i t i o n and Q u a l i t y Scales

count of the method of right-and-wrong cases, it will assist us in seeing the defects of quality scales. They experimented with such simple phenomena as the extent and force of movement. In their experiment on the extent of movement, for example, there were two uprights, one fixed and the other movable, attached to a scale fastened along the edge of a table. T h e movable upright was placed at the 500 or 510 mm. mark on the scale, as the case might be. A subject whose vision of the apparatus was obstructed by a screen was asked to move his finger between the uprights. B y doing this he obtained the standard movement against which to compare the succeeding movement. Then as the subject prepared to make a second movement, the free upright was shifted to the 510 or the 500 mm. mark, as the case might be. A f t e r the second movement was made, the subject was asked to judge whether it was longer or shorter than the first. In other words, he was asked to judge whether the uprights were closer together or farther apart in the second of each pair of movements. 8 In such cases as these the quality under observation is homogeneous and well defined. The extent of movement or distance is a uniform quality, and under carefully controlled conditions variations of judgment in respect to its magnitude are no doubt errors of the same kind and involve a single quality. T h e theory that such variations are errors of observation is at least plausible.9 When the method is applied to the measurement of such qualities as the merit of compositions, this plausibility can no longer be granted. For in such cases the quality is heterogeneous and complex. Merit is not a homogeneous and well-defined quality. It is a composite of many factors, irrespective of whether the product possessing it is oriental rugs, handwriting, or composition. For example, as Hillegas points out, merit of composition may be thought of as comprising form Fullerton and Cattell, On the Perception of Small Differences, pp. 59 ff. It is interesting to note in this connection, however, that Fullerton was not convinced that these errors of variations were independent. See On the Perception of Small Differences, n., p. 26. 8

9

A d d i t i o n and Quality Scales

153

and content. Each of these factors may then be analyzed into their respective elements. Form consists of such elements as punctuation, grammatical construction, emphasis, word choice, spelling, and so on. Content may consist of even a larger number of elements, each so variable as to defy expression in the form of rules or principles. Merit is therefore quite a complex quality, as compared to "extent of movement." This fact was recognized by the constructors of quality scales. They even observed that judgments of merit sometimes tended to emphasize some elements, to the exclusion of others. But the significance of these facts was overlooked. They indicate that variations of judgment are due not to errors of observation, but to qualitative changes of judgment. The conclusion that the variations are due to qualitative changes of judgment becomes evident in the light of recent psychology. A response is not a reaction to a mere physical thing. It is not a simple S - R process. On the contrary, an individual makes responses to a particular organization of stimuli. The individual sees a group of stimuli, he relates or organizes them into some particular pattern, and then he responds to this organization. The individual reconstructs the objective source of stimulation. H e is hence not a machine whose actions are determined entirely by external conditions ; the events that occur within himself play a part in determining how he responds to these conditions. In a word, he himself helps to determine the character of the stimulus situation. When a stimulus situation maintains a more or less constant structure, it seems reasonable to assume that judgments of the intensity of any of its aspects may remain more or less qualitatively identical. Then it may be permissible to assume that variations of judgment are errors of the same kind and are due to the infinite physical, physiological, and psychological factors that accompany observation. This seems to be precisely the condition that attends the measurement of extent of movement, force of movement, and the like, by the method

1^4

Addition and Quality Scales

of right-and-wrong cases. Here the factors of the stimulus situation are smaller in number than in the case of merit of composition, for example, and the structure of the situation is hence more stable. When the structure of the stimulus situation is unstable, the basis upon which judgments are made is variable. Then the variations of judgment are due to a qualitative change of the situation itself, rather than to the factors that ordinarily attend observations. The quality called merit of composition, for example, will depend upon the relationship which its various elements have in the experience of the observer. Merit is a kind of emergent quality, whose character is a function of the way in which the elements are related or organized in the observer's experience. It is thus not a simple objective quality, but a construction. Different observers will relate its elements in various ways, depending upon their experiences, feelings, and the like. Some will emphasize elements that others neglect. One observer will see merit in one group of elements, and another will be equally convinced of the merit of a sample in terms of some different constellation of factors. Moreover, an individual will judge one composition as better than another at one time and as inferior at some subsequent time, because of a change in the significance and relatedness of the elements in his experience. The elements will be related differently in the first observation than they are in the second, thus giving rise to a different kind of merit. As changes in the relatedness of the elements occur, changes also occur in the quality called merit. When seen in this light, the variations of judgment turn out to be results of changes in the quality itself, and the theory that they can be treated as errors of observation is then no longer tenable. If they were errors of observation, they would be variations of the same kind—the judgments would vary in terms of the magnitude or intensity of the quality. But instead, the judgments vary because their objective basis tends to take on the character of a chameleon as the situation is

A d d i t i o n and Quality Scales

155

organized and reorganized. It is, of course, true that the stimulus situation will not vary with every observation. There may be cases in which it will remain somewhat constant for a number of judgments. But as one composition is compared to another, the observer finds himself shifting the basis of his judgment. 1 0 W e thus obtain a succession of judgments of different kinds of merit, and not observations of differences of merit. W e are not denying that a transitive asymmetrical order of compositions, for example, is empirically established. W e are simply interpreting the process by which they are ranked in a series, with a view to disclosing the validity of the claim that equal units of merit are obtained. If the preceding argument is essentially correct, the contention that "183 employed in this scale may be considered as 183 small units of quality, or as 18.3 units ten times as large, or as 1.83 units one hundred times as large" is without foundation. The reason that such a claim has been permitted to stand is to be found in our inclination to assume that things called by the same name are of necessity identical. The process of discriminating among samples of handwriting, composition, and the like, is labeled "judgment of merit." Then we let the variations of judgments represent amounts of merit, on the assumption that the judgments of merit are all the same quality because they have the same name. In this way, a quantitative meaning is read into a process that in most cases consists simply of passing from one qualitative judgment to another, somewhat in the same fashion as that in which our perceptual processes yield first one and then another possibility in the search for a face in a picture puzzle. 1 0 A y r e s was able to avoid this problem by defining merit operationally. In the construction of his handwriting scale merit was defined in terms of the rate at which a specimen of handwriting could be read. See A Scale for Measuring the Quality of Handzwiting of School Children.

156

A d d i t i o n and Q u a l i t y Scales UNEQUAL

U N I T S ON T H E

NORMAL

CURVE

In turning to the second assumption underlying the claim of equal units, the reader will readily recognize it as the same as that upon which the units of developmental scales depend. That is, segments of the base line of a normal curve represent unit quantities of the quality represented by the curve. This assumption is readily seen in the procedure by which the units are derived. The only thing which is experimentally determined about the merit of specimens of handwriting or composition, for example, is that the specimens are ranked in terms of it. All other steps in this procedure of scale construction lie outside of the area of experimental work. They consist of assumptions and statistical manipulations. The mere ranking of specimens in order of merit, even if merit were uniform from specimen to specimen, would not establish equal units. This is, of course, generally recognized and admitted. But it is not recognized and admitted that the percentages of "better" judgments cannot be manipulated so as to yield equal units. Failure to recognize this fundamental fact leads to attempts to derive equal units by selecting some arbitrary percentage of "better" judgments, and then locating the point at which ordinates representing these percentages strike the base line of normal curves of judgments. The distance on the base line between the ordinate representing 50 per cent of the judgments and the one representing 75 per cent, for example, is taken as a unit quantity of merit equal to any other such unit quantity. In this way, the instrument-makers unwittingly evade the responsibility of devising experimental operations to interpret the axioms of addition. There is no need to rehearse the argument developed in the last chapter against this procedure of obtaining equal units. It is sufficient to recall that the units obtained in this manner are merely conventional, since they lack experimental confirmation. The claim of Hillegas, for example, that the numerical

A d d i t i o n and Q u a l i t y Scales

157

values of his composition scale can be used in the same sense as we use millimeters to express the length of lines, rests upon no experimental interpretations of the axioms of addition. The numerical values assigned to compositions on the Hillegas scale are derived from quite different considerations than are those on the scale of length. In the latter, the ratio of the segments of the quality is known to be the same as the ratio of the numbers, because they have been shown to be identical by experimental operations. In the former case, however, there are no experimental operations save those used in establishing a transitive asymmetrical order, and they do not permit us to assert that the quality is subject to the same manipulations as the numbers. Such an assertion is permissible only after the conditions of addition have been experimentally satisfied. If we are to escape confusion and misunderstanding, it is necessary that we bear in mind that in measurement, numbers are introduced under experimental conditions. If they are introduced apart from these conditions, their meaning can be no more than tags or labels, such as the numbers of houses or of prisoners. If we always ask, " W h a t are the rules by which numbers are introduced?" and see that these rules are experimentally interpreted, we shall save ourselves much delusion. The numbers introduced by an experimental interpretation of the conditions of order can never be manipulated in such a way as to represent or to yield units of equal amounts of the quality they stand for. The belief that they can is the source of the confusion lying back of the quest for equal units in educational measurement. SUMMARY

In this chapter we have dealt with instruments that purport to measure the merit of objective products in units of equal amount. It has been shown that these instruments in no way satisfy the axioms of addition. It was also shown that the equality of the units of these instruments rests upon two as-

158

Addition and Quality Scales

sumptions : first, that the variation of judgments of merit can be treated as errors of observation ; and second, that segments of the base line of the normal curve represent unit quantities of merit, when the judgments of merit are normally distributed. These two assumptions were denied, because the variations of judgments are not necessarily errors of observation, and the derivation of units from the base line of the normal curve is merely a deduction devoid of experimental confirmation.

IX

The Outlook of Educational Measurement

S

UCH an analysis as we have made naturally leads one to ask : What is the future of measurement in education ? In what direction will it move ? Will it become more precise and rigorous, satisfying more nearly the specifications of measurement ? Or has it reached its maximum precision ? Will it be forced to modify its basic procedures, so as to take account of the outcomes of learning broadly conceived ? Or can it evaluate such outcomes with the techniques already evolved ? No final answer can be given to these questions, but a few observations will be made and the most probable directions pointed out. The development of educational measurement depends as much upon the nature of the phenomena with which it is concerned as upon the capacity of men to devise experimental procedures. For, as Aristotle observed long ago, we should not expect a greater degree of exactitude than the subject matter will permit. W e know the kind of subject matter which measurement demands, but we do not know, nor can we ascertain, apart from experimental work, whether a given subject matter has the systematic character required by measurement. We can gain some insight into the nature of a subject matter, however, from conceptual descriptions of it based upon experience and empirical observations, and we must turn to such descriptions as these in our efforts to answer the questions provoked by our discussion. These descriptions, so far as the subject matter of educational measurement is concerned, are found in certain current tendencies in psychology. While the nature of the subject matter treated in educational measurement has been a recurring theme in our discussion, we need to

i6o

The Outlook of Measurement

note more specifically the psychological descriptions of it, as contained in these tendencies, and to point out their meaning for measurement. TOWARD

SIMPLICITY

AND

ABSTRACTNESS

Current psychological thought shows two distinct tendencies, and correspondingly two different sets of implications for educational measurement. There is first a tendency to think of the organism as a unity, that is, as responding and changing as a whole, at the same time that any part is responding and changing ; to think of an element of the organism as having meaning only in terms of the whole of which it is a part; and, hence to think of learning outcomes as emergents, that is, as qualitative changes. Opposed to this organismic view is a tendency to explain intelligent adjustments of the organism as a multiplicity of more or less independent factors, working together, but maintaining their identity in the process. In solving a mathematical problem, for example, factors such as number facility, inductive ability, memory, quickness of fatigue, and visualizing ability are involved in such a way that they can be identified, analyzed, and treated individually. This is not a new psychological concept, for the rudiments of it are found in the old belief in talents. Again, there have perhaps always been psychologists who believed that rational adjustments of the organism are mediated by many more or less independent factors or abilities ; and Spearman's theory that an activity consists of a general factor, together with certain special abilities, dates back to the beginning of the present century. Since the early work of Spearman, the experimental and mathematical techniques of analyzing behavior have been gradually improved, and in the last few years much attention has been given to clarifying the theory of factors and to extricating the elements that compose intelligent activities. The description of behavior derived from the second of these tendencies, that is, factor analysis, is in line with the

T h e Outlook of Measurement

161

requirements of measurement. For, as we have repeatedly pointed out, measurement requires a particular structure—the phenomenon must be capable of analysis into elements, and the elements must then be amenable to the specifications laid down by the calculus of science. The purpose of measurement, as noted before, is to describe properties accurately, so as to discover their relationships, thereby enabling us to control a thing by manipulating that with which it is correlated. Hence when any unusual observations appear in a property, control is upset, and the occasion for further analysis arises. Since one analysis only leads to another, the procedure of measurement inevitably tends to reduce a subject matter to simpler and simpler elements. If educational measurement is to satisfy the specifications which its axiomatic conditions lay down, it must discover elements that can be isolated and experimentally manipulated. By applying the techniques of factor analysis, it may be possible to dissect behavior so thoroughly that various elements can be isolated and handled with a rigorousness approaching that which the science of measurement necessitates. If there are certain elements which operate in human behavior and give it its character and direction, the isolation of these elements will perhaps afford the homogeneous and continuous properties required by measurement. For a long time we have based our instruments upon gross behavior, in much the same sense that the early physicists based thermometers upon the unanalyzed behavior of a column of liquid. The physicists finally discovered, however, that the rise and fall of the liquid was dependent upon two factors : temperature and atmospheric pressure. Before this discovery was made, the reading on a thermometer covered both of these factors, despite the fact that the instrument was used to measure temperature alone. In a similar manner, it is now being recognized that a score on an achievement test masks a number of factors. A n d the future of precise, educational measurement appears to be

IÓ2

The Outlook of Measurement

involved in the isolation and quantification of these elements. There is no way of knowing, in advance of adequate experimental work, what the achievements of factor analysis will be. There is now considerable argument as to what are the various elements of behavior, as well as whether or not the elements are really independent when their status within the organism is taken into account. 1 If factor analysis is successful in reducing behavior to simple elements, however, it will then be possible to construct instruments of measurement consisting of items all of which tap the same element or property. This will go a long way toward establishing the validity of instruments and reducing the outcomes of learning to the specifications of measurement. In the meantime, it can only be said that factor analysis appears to be the most hopeful development in so far as the achievement of rigor and precision in educational measurement is concerned. TOWARD INTEGRATION AND

COMPLEXITY

Against the fact that measurement requires properties capable of isolation and rigorous definition, stands the unity of the organism. The concept of integration, though by no means new, is now occupying an increasingly large place in educational and psychological thought. The idea of integration does not admit discrete and independent elements, in the mediation of intelligent adjustments of the organism, nor does it mean the simultaneous action of a number of independent and micorrelated factors somehow directed toward a common end. It does imply that behavior is a unitary process, the elements of which owe their character to their function in the whole. Thus, according to this theory, the behavior of an individual must be described in terms of behavior patterns, rather than elements such as neuron connections and independent factors or abilities. The tendency to think of the behavior of an individual as a unitary process, rather than as an aggregate of 1

Chant, "Multiple Factor Analysis and Psychological Concepts."

T h e Outlook of Measurement

163

elements, leads naturally to the depreciation of efforts to discover the simple units by which the more complex process can be explained and controlled. There is thus an irreconcilable conflict between the inevitable movement of measurement toward simplicity and abstractness on the one hand, and the complexity and unity of the organism on the other. W e can hardly expect educational measurement to reach the rigor and precision that science demands of its implements, therefore, if the concepts of organismic psychology are permitted to dominate its techniques and procedures. But some students think that what it loses in precision and rigor, it may gain in more fruitful service to education and society. Some methodological implications of the organismic view of learning must now be pointed out. According to this view, a learning outcome is a qualitative change in the organism. In other words, an outcome of learning, in chemical language, is a compound ; it is something new—different from the elements that compose it. Hence when it has actually been acquired, the organism has a new character, a new quality. It may be acquired gradually or it may come suddenly, but in either case an outcome of learning is not achieved until the qualitative change has actually taken place, just as in a chemical process water, for example, is either produced or it is not. There is thus a critical point in experience, at which an outcome emerges and is built into the self. If this view of learning is correct, it has far-reaching implications for educational measurement. It will demand that instruments be built to discover whether or not learning has actually taken place, that is, whether an outcome has been built into the self. It will necessitate the establishing of critical points, or a range of points, on instruments to indicate the presence of learning. It will require the substitution of the concept of qualitative change for the concept of mere quantity. These requirements will be nothing new in science, for physics is replete with critical points at which changes occur from one

164

T h e Outlook of Measurement

property to another. For example, water changes from a liquid to a vapor at 100 degrees Centigrade and it begins to condense to a solid at 4 degrees. A t a certain point in wave frequency color becomes a psychological phenomenon; above a certain point it ceases to exist. A n d everyone knows that fever is a qualitative change in the organism that is detected by means of a critical point on a clinical thermometer. Furthermore, this view of learning implies that it is more important to know that a person has a particular attitude of understanding, for example, than to know how much he differs from some other person in the amount of learning he possesses. A s the full import of this view is felt, the techniques and procedures of educational measurement will undergo fundamental reconstruction. Individual items of a test will cease to be important on their own account, because their value will depend upon the light they throw upon the question : Has a given learning actually been acquired? The criterion of discriminative capacity by which items are now selected, based as it is upon the concept of quantity, will necessarily be modified to conform to the concept of qualitative evaluation. The connection between instrument construction and the objectives of education will no longer be overlooked, because the determination of qualitative changes in the learner will make it direct and imperative. SUMMARY

In the light of the foregoing discussion, we can answer the questions stated at the beginning of this chapter by saying that educational measurement, paradoxical as it may seem, will move in two directions : first, it will seek for those independent and homogeneous elements with which it can deal most successfully; and, second, it will rest content with the degree of accuracy it has attained and will attempt to adapt its techniques and procedures to the requirements of qualitative evaluation. Indeed, there is already evidence that students of

T h e Outlook of Measurement

165

educational measurement are dividing into camps corresponding to these two tendencies. Since science inevitably tends toward unity, however, it will not always tolerate this division. Either one side must prove itself more fruitful than the other, resulting in the rejection of the latter, or else the two will be merged into some larger and more inclusive theory.

Bibliography Ayres, Leonard P. A Scale for Measuring the Quality of Handwriting of School Children. "Publications on Measurements in Education," Issued by the Division of Education. Russell Sage Foundation, Bulletin No. 113, New Y o r k , 1911. . "History and Present Status of Educational Measurement," National Society for the Study of Education, Seventeenth Yearbook, Part 2, 1918. Beard, C. A . The Rise of American Civilization. 2 vols., Macmillan, New York, 1930. Bergson, Henri. Time and Freewill. English trans. Macmillan, New York, 1910. Boring, E. G. A History of Experimental Psychology. Century, New Y o r k , 1929. . "The Logic of the Normal Law of Error in Mental Measurem e n t " American Journal of Psychology, X X X I . (1920), 1-33. Bridgman, P. W . Logic of Modern Physics. Macmillan, New York, 1927. Brown, J. F. " A Methodological Consideration of the Problem of Psychometrics." Erkenntnis, II (1934), 46-61. . Psychology and the Social Order. McGraw-Hill, New Y o r k , 1936. Buckingham, B. R. Spelling Ability : Its Measurement and Distribution. "Teachers College Contributions to Education," No. 59. Teachers College, Bureau of Publications, New York, 1913. Campbell, N. R. Measurement and Calculation. Longmans, Green, New York, 1929. . Physics, The Elements. Cambridge University Press, London, 1920. and B. P. Dudding. "Measurement of Light." Philosophical Magasine, X L I V (6 Series, 1922), 577-90. Cattell, J. McKeen. "Mental Tests and Measurements." Mind, X V (1890), 373-80. . " T h e Conceptions and Methods of Psychology." Popular Science Monthly, L X V 1 (1904), 176-86. and L. Farrand. "Physical and Mental Measurements of the Students of Columbia University." Psychological Review, I I I (1896), 618-47.

i68

Bibliography

Chant, S. N. F . "Multiple Factor Analysis and Psychological Concepts." Journal of Educational Psychology, X X V I , 263-72. Coghill, G. E. Anatomy and the Problem of Behaviour. Cambridge University Press, London, 1929. Cohen, Morris R. Reason and Nature. Harcourt, Brace, New York, 1931· and Ernest Nagel. A n Introduction to Logic and the Scientific Method. Harcourt, Brace, New York, 1934. Courtis, S. A . "Measurement of Growth in Efficiency in Arithmetic." Elementary School Teacher, X , 58-74, 1 7 7 - 1 9 9 ; X I , 171-85, 360-70. Darwin, Charles. The Expression of the Emotions in Man and Animals. D. Appleton, New York, 1913. Darwin, Francis (ed.). Life and Letters of Charles Darwin. 2 vol., D. Appleton, New York, 1891. Dewey, John. Experience and Nature. Open Court, Chicago, 1926. . Human Nature and Conduct. Henry Holt, New York, 1922. . The Quest for Certainty. Minton, Balch, New York, 1929. Du Noiiy, Lecompte. Biological Time. Macmillan, New York, 1937. Fullerton, G. S., and J . McKeen Cattell. On the Perception of Small Differences. University of Pennsylvania. "Philosophical Studies," No. 2, 1892. Galton, Francis. Hereditary Genius. Macmillan, London, 1914. . "Remarks," on an article by Cattell. Mind (1890), X V , 380-83. . "Scope of Biometrika." Editorial in Biometrika, I ( 1 9 0 1 ) , 1-3· Garrett, H. E. Statistics in Psychology and Education. Longmans, Green, New York, 1929. Guilford, J. P. Psychometric Methods. McGraw-Hill, New York, 1936. Hawkes, H. E., E . F . Lindquist, and C. R. Mann. The Construction and Use of Achievement Examinations. Houghton Mifflin, Boston, 1936. Hillegas, Μ. Β. A Scale for the Measurement of Quality in English Composition by Young People. Teachers College, Columbia University, New York, 1913. Horst, Paul. "Increasing the Efficiency of Selection Tests." The Personnel Journal, X I I , 254-59. Judd, C. H. Education as Cultivation of the Higher Mental Processes. Macmillan, New York, 1936.

Bibliography

169

Kelley, T . L. " T h e Principles and Techniques of Mental Measurement." American Journal of Psychology, X X X I V , 408-32. Kilpatrick, William H. Remaking the Curriculum. Newson, New York, 1936. Koffka, K . Growth of the Mind. Harcourt, Brace, New York, 1928. . Principles of Gestalt Psychology. Harcourt, Brace, New York, 1935. Köhler, W . Gestalt Psychology. Liveright, New York, 1929. Lashley, K . S. Brain Mechanism and Intelligence. University of Chicago Press, Chicago, 1929. Lee, J. M., and P. M. Symonds. " N e w Type or Objective Tests: A Summary of Investigations." Journal of Educational Psychology. X X V , 161-84. Lewin, Κ . Dynamic Theory of Personality. McGraw-Hill, New York, 1935· Lindquist, E. F., and H. R. Anderson. "Achievement Tests in the Social Studies." Educational Record, X I V , 198-256. and W . W . Cook. "Experimental Techniques in Test Evaluation." Journal of Experimental Education, I, 163-85. McCall, W . A . How to Experiment in' Education. Macmillan, New York, 1923. . How to Measure in Education. Macmillan, New York, 1922. Mead, G. H. Mind, Self and Society. University of Chicago Press, Chicago, 1935. . Movements of Thought in the Nineteenth Century. University of Chicago Press, Chicago, 1936. . The Philosophy of the Present. Open Court, Chicago, 1932. Merz, J. T. History of European Thought in the Nineteenth Century. Vol. I, 1896; II, 1903; III, 1907. Monroe, Paul (ed.). Conference on Examinations. Eastbourne, England. Teachers College, Bureau of Publications, New York, 1931. Morrison, H. C. Basic Principles in Education. Houghton Mifflin, New York, 1934. . The Practice of Teaching in the Secondary School. University of Chicago Press, Chicago, 1932. Murphy, G. A n Historical Introduction to Modern Psychology. Harcourt, Brace, New York, 1932. Nagel, Ernest. On the Logic of Measurement. Privately printed, 1930. Odell, C. W. Educational Tests for Use in High Schools. University of Illinois Press, 1929.

170

Bibliography

Rice, J. M. Scientific Management in Education. Hinds, Nobel and Eldridge, New York, 1913. Ritchie, A . D. Scientific Method. Harcourt, Brace, New York, 1923. Rugg, Harold. Culture and Education in America. Harcourt, Brace, New York, 1931. Russell, B. Principles of Mathematics. Cambridge University Press, London, 1903. Sherif, M. The Psychology of Social Norms. Harpers, New York, 1935· Smith, Max. The Relation between Item Validity and Test Validity. "Teachers College Contributions to Education," No. 621. Teachers College, Bureau of Publications, New York, 1934. Spearman, C. "The Proof and Measurement of Association between T w o Things." American Journal of Psychology, X V (1904), 72-101. Stephens, J. M. The Influence of the School on the Individual. Edwards Brothers, Ann Arbor, 1933. Stone, C. W . Arithmetical Abilities and Some Factors Determining Them. "Teachers College Contributions to Education," No. 19. Teachers College, Bureau of Publications, New York, 1908. Talbott, E. O., and G. M. Ruch. "Minor Studies in Objective Examination Methods." Journal of Educational Research, X X , 199206. Thorndike, E. L. A n Introduction to the Theory of Mental and Social Measurements. Teachers College, Columbia University, 1913. . "Darwin's Contribution to Psychology." University of California Chronicle, X I I (No. 1, 1909), 65-80. . "Handwriting." Teachers College Record, X I (No. 2, 1910), 1-81. . Human Learning. Century, New York, 1931. . "The Nature, Purposes, and General Methods of Measurements of Educational Products." Seventeenth Yearbook of the National Society for the Study of Education, Part II, 1918, pp. 16-24. . " T h e Measurement of Achievement in Drawing." Teachers College Record, X I V ( 1 9 1 3 ) , 1-67. and Others. Measurement of Intelligence. Teachers College, Bureau of Publications, New York, 1927. Thurstone, L. L. "Attitudes Can Be Measured." American Journal of Sociology, X X X I I I (1928), 529-54. —* . "Equally Often Noticed Differences." Journal of Educational Psychology, X V I I I (1927), 289-93.

Bibliography

171

Tolman, E. C. Purposive Behavior in Animals and Men. Century, New York, 1932. Trabue, M. R. Completion-Test Language Scales. "Teachers College Contributions to Education," No. 77. Teachers College, Bureau of Publications, New York, 1916. Tyler, R. W. Constructing Achievement Tests. Ohio State University. Columbus, 1934. Vincent, Leona. A Study of Intelligence Test Elements. "Teachers College Contributions to Education," No, 152, Teachers College, Bureau of Publications, New York, 1924. Walker, Helen M. Studies in the History of Statistical Method. Williams and Wilkins, Baltimore, 1929. Wechsler, David. The Range of Human Capacities. Williams and Wilkins, Baltimore, 1935. Whitehead, A. N. An Introduction to Mathematics. Henry Holt, New York, 1 9 1 1 . . The Aims of Education and Other Essays. Macmillan, New York, 1929. Whitehead, T. N. The Design and Use of Instruments and Accurate Mechanism. Macmillan, New York, 1934. Woody, Clifford. Measurements of Some Achievements in Arithmetic. "Teachers College Contributions to Education," No. 80. Teachers College, Bureau of Publications, New York, 1916. Wrightstone, J . W. Appraisal of Experimental High School Practices. Teachers College, Bureau of Publications, New York, 1936.

Index Absolute, term, 77 Achievement, concept and measurement of, 76 Achievement instruments, pattern, 125-31 Achievement tests, see Developmental tests Action, a primary factor in process of adaptation, 25 Activity, Spearman's theory, 160 Adaptation, action a primary factor in process of, 25 Adaptive change distinguished from adaptive response, 117 Addition, minimum requirements, 64; commutative and associative axioms, 65 ; an experimental concept, 66; substitutes for experimental interpretations, 73 ; performance and, 125-43; axioms experimentally satisfied, 125 f. ; substitutes for experimental confirmation of axioms of, 125; the pattern of achievement instruments, 125-31 ; quest for equal units, 131-35 ; a critique of the procedures of obtaining units, 13638; types of units, 138-42; equal units established by, 142; quality scales and, 144-58; an analysis of judgments of merit, 150-55; unequal units on the normal curve, 156-57 Adjustment, 25 A f f e c t i v e side of self, h i A l g e b r a scale, 51 Anderson, H . R., Lindquist, E . F., and, quoted, 129 Anthropometric Laboratory, South Kensington, 36 Anthropometry, 24, 49 Appreciation, attitudes of, 110, h i

Aristotelian science, largely classificatory, 1 Aristotle, 159 Arithmetic tests, 44, 45, 46, 47, 49, 50, 51, 134, 139 Arts, external standards of evaluation, 55, 112 Association, axiom of, 65 Associative learning test, 119 Assumption of normality, 132 Astronomers, study of error of observations and abandonment of problem, 34 Astronomical observations, give rise to study of reaction time, 33 Asymmetrical relation, 62 Atomic hypothesis, 15 Atoms, reduction of, to electrical units, 16 Attitudes, self as a structure of, 102-9; generalized, 105; of physical understanding, n o ; of appreciation, 110, i n ; of social understanding, n o , i n ; understanding v. attitude of understanding, 115 A x i o m s , two sets, 61, 75 ; order, 61 ff., 125 f. ; addition, 64 ff. A y r e s , Leonard P., handwriting scale, 46, 50 Balance scales, 1 Barometer, 1 Beard, C. Α., quoted, 54 Behavior, mechanical interpretation, 2 3 ; reduced to pattern of N e w tonian principle of action and reaction, 25 ; the proper datum of psychological study, 26 ; statistical approach to treatment of, 37 ; outer and inner aspects, 97-102; overt, 98-99; emergence in experience an internal aspect of, 100; physi-

174

Index

Behavior ( C O M Í . ) ological type, 100, ι ο ί ; internalized, 100-2 ; self a character of, 106 ; kind that involves the self, 109; factor analysis, 160; isolation of elements, 161 ; as a unitary process, 162 Bergson, Henri, quoted, 116 Bessel, Friedrich W i l h e l m , personal equation, 29, 34 Binet, A l f r e d , 4 9 ; theory regarding mental measurement, 37 Biology, educational implications of developments in, 89, 90 Biometry, 24, 49 Boas, Franz, 40 Boring, E . G., 132 η B r o w n , J. F., 76 ff. ; quoted, 77 Buckingham, B . R., spelling scale, 47, 49, 50 Campbell, N . R., 64 η Cattell, James M c K e e n , 40, 49; influence in shaping A m e r i c a n psychology, 35 ; opened psychological laboratory at Columbia U n i v e r sity, 36 ; w o r k on small noticeable differences, 36 and Farrand, w o r k on physical and mental traits, 3 7 ; statistical approach to treatment of behavior, 37 and Fullerton, G . S., measurement studies, 20, 46, 144, 146, 151 Cavendish, H e n r y , 1 5 ; electrical resistance, 8 Chance, theory of, 29 Change, adaptive, 117 Chemistry, linked with principles of mechanics, 1 5 ; appears as a branch of physics, 1 5 ; connection with electricity established, 16 Clocks, water wheel, pendulum, 1 Coghill, G. E., quoted, 130 Color, 99 Columbia University, psychological laboratory opened by Cattell, 36 Commutative principle of addition,

Completion-type questions, 49 Composition, merit of, 149, 152, 154; comprises f o r m and content, 152 Compositions, transitive asymmetrical order, 149, 155 Composition scale, 50, 149, 157 Consciousness, subject to natural forces, 2 5 ; expressed in terms of motor action, 25 f. Conservatives, conflict with progressives, 43 Courtis, S. Α . , arithmetic tests, 46, 49, Si Criminal tendencies, factors related to, 30 Curriculum, expansion of, 42, 43; efforts to reorganize, handicapped by instruments of evaluation, 54 Curve, normal, 133 n, 136 ff.

Dalton, John, atomic hypothesis, 15 D a r w i n , Charles, introduction of dynamics of organic forms, 14; Origin of Species, 21 ; idea of variation, 22 ; theory of natural selection, 24 ; environmental change and adjustment of organism, 25 ; study of expression of emotions in man and animals, 26 ; two great principles, 27 ; quoted, 2 8 » Density, 76, 79, n o Derived measurement, 76 Descartes, René, 17 Developmental tests, 50; Rice's pioneer w o r k , 44 ; illustrations, 51 ; performance and validity, 96-124 (see also P e r f o r m a n c e ) ; general achievement test defined, 129 Deviation f r o m an average, l a w of, 30, 31 D e w e y , John, 104 n ; quoted, 106 Difference, establishment of relation of, 63; just-noticeable, 145, 146 Differences, small noticeable, Cattell's w o r k on, 36; noted equally

Index Difficulty, criterion of, in spelling scale, 4 7 ; operationally defined, 134 Dog, salivary reflex, 100 D o g fight, stimulus and response, 104 D r a w i n g scale, 1 5 0 » Dreams, 114 Dualism of body and soul, 17 Dynamics of organic forms, 14, 2127 Ebbinghaus memory tests, 49 Education, spirit of precision psychology radiated to study of, 38; conservative and progressive leaders, 42 ; scientific study of, not in question, 93 ; youth and inconclusiveness of experimental study, 94 "Education 108—Practicium," 41 Educational measurement, see Measurement, educational Educational problems, quantitative approach to, 24 E g o , see Self Electric current, generated by sciatic nerve of frog, 16 Electrical resistance, determining, 8 Electricity, connection with chemistry established, 16 Elements of Psychophysics (Fechner), 21 Elements of subject matter and of outcome of learning not adequately defined, 128 Enumeration, measurement and, 4-5 Environment, relation of organism to, 24, 25 Equality, problem of, 63, 70 ff. Equal units, see Units Errors, law of, mental and moral qualities distributed according to, 30 ; applicable to variations of organic forms, 31 theory of, 29 E r r o r s of observation, see Observation Events, underlying, relation of performance to, 120, 123

175

Evolution, 21, 27 Experience, emergence in, 100; responses having counterpart in, 102; called up by gestures, 104 Experimental studies, conflicting and unconvincing, 12 Factor analysis, 160 Factors, theory of, 160 Frequency curves, 134 n, 137 Falling bodies, experimental verification of laws of, ι Fechner, Theodore, 18 ff., 40, 46 ; Elements of Psychophysics, 21 ; psychophysical method marked limit of advance of physics, 24; study of sensations, 28, 32, 33, 34, 144 Forum, The, 42, 43 Frozen judgments, instrument of measurement a set of, 7 Fullerton, G. S., and Cattell, J. M c K . , measurement studies, 20, 46, 144, 146, 151 Fundamental measurement, 75 Galileo, 14, 7 9 ; laws of falling bodies, I, 94 Galton, Sir Francis, 24, 49, 133; quoted, 28, 30, 49; saw law of errors applicable to variations of organic forms, 31 ; mathematical applications, 33; Anthropometric Laboratory, 36; statistical methods adopted by Cattell, 36 Galvani, Luigi, 16 Games, attitudes assumed in, 105 Garrett, H . E., quoted, 133 Gauss, K a r l Friedrich, 29, 32 Generalizations, paucity of, 94 Genius as hereditary character, Galton's study of, 31 Gestures, 104 Handwriting scales, 46, 48, 50, 148, I5S η Hardness, 70 Height, relation to weight, 31 ; frequency curves, 137 Helmholtz, H . L . F. von, 66

176

Index

Henmon Latín Test, 51 Herbart, Johann Friedrich, 18 ; mechanics of mind, 28 Heredity, Galton's study of, 31 Hillegas, Μ . Β., 152, 156, 157; composition scale, 50 ; quoted, 149 History scale, 51 H o l t z F i r s t - Y e a r A l g e b r a Scale, 51 H u m a n behavior, see Behavior H u m a n nature, turning point in methodology of studying, 21 Human traits, variation of, 30; see also Personal equation Hydrometer, Boyle's, 1 Ideal forms, 22 Ideas, origin, 104» Individual differences, see Personal equation Information, acquisition of, 113, 118 Inheritance, see Heredity Inner life, set limits of mechanical science, 17, 20; subject to quantitative description, 21 Instruments, medieval, I ; rapid development, 2 ; and measurement, 7-10 ; how deal with things without them, 7 ; general classes, 9, 48-52 ; general characteristics, ιοί 2 ; process of construction, theoretical aspects, 10; early development and classification, 40-56; first, of educational measurement, 45-48; criticisms, 52-56; achievement, pattern of, 125-31 ; see also Scales validity of, logical aspects, 8i-95 I procedure of establishing validity, 81-85 ; problem of determining the best set of operations, 83 ; theoretical foundation, 85 ; absolute proof prevented by polar nature of, 86-88; guiding principles, 87 ; mechanical and quantitative assumption upon which formulated, 89 ; conditions : that properties be amenable to experimental procedure, 89-90 ; that conclusions and relationships be satisfactory guides to action, 90-95 ;

paucity of generalizations, 94; performance and, 96-124 (see also Performance) ; implications for, 119-23 Integration, concept of, 162 Intelligence, measurement of, 40, 76 Internal status, 120 Intransitive relation, 62 Introspective psychology, 103 Items of an instrument, 85, 120 ff., see also Instruments Judgments, measurements and, 6-7; variations of, 145, 150; of merit, analyzed, 150-55; problem of explaining variations, 151 ff. ; qualitative changes, 153 Kelley, T . L., 132 η K o f f k a , K., 131 ; on the ego system, 106», 107, 114 Köhler, W . , quoted, 97 Lamarck, J. B. P . A . de M . de, 21 Language, the ideal significant symbol, 104 L a n g u a g e tests, 44, 47, 50, 51, 134 Laplace, P . S. de, 29 Lashley, K . S., 130 Lavoisier, Antoine Laurent, linked chemistry with principles of mechanics, 15 L a w , normal, 30 ; application to human properties and relationships, 31 Learning, movement to evaluate in terms of the products, 41, 44; as acquisition of information, rejected, 52; defined, 68; procedures for establishing transitive asymmetrical order, 68 ; defining character of, the problem of measurement, 82; assumptions in respect to the nature of, 84, 86; alternative theories, 89; assumption of normality, 132 Learning outcomes, and the self, 109-19; configuration of attitudes, 110; on the surface of self, 1 1 4 ; theory that all are basically the

Index Learning outcomes (cont.) same, 1 1 8 ; elements, 1 2 8 ; acquisition of, 130, 1 6 3 ; defined, 163 Least squares, method of, 32 Legendre, A . M., 32 Length, scale of, 157 Lewin, Κ., ιο6 Liminal stimulus, 19 Lindquist, E . F., and Anderson, H . R., quoted, 129 Linear scale, 1 McCall, W . Α., 1 3 3 ; ThorndikeMcCall Reading Scale, 51 Map, as illustration of structure, 57 Maskelyne, Nevil, 29 Mass, measurement of, 73, 132 Mathematical statistics, see Statistics Mathematics, indispensable tool of science, 2 7 ; Fechner's use of instruments of, 28; of probability, 29; rigor of, in study of inheritance, 31 ; Galton's application of, to variations of forms, 31 ; to relationship between variable qualities, 3 2 ; use of, in measurement, 59 ; relationships, not numbers and symbols, the essence of, 60; nature of, 136 Mead, G. H., quoted, 100 ; treatment of the self, 102 ff. Measurement, meaning of, 1 - 1 3 ; woven into fabric of modern life, 2 ; lack of understanding of, 3 ; and enumeration, 4-5 ; and judgments, 6-7 ; and instruments, 7-10 ; general characteristics of instruments, 10-12 (see also Instruments) ; introduction into field of education, 1 1 ; based on preconceived abstract conditions, 1 1 ; a definite concept, 12 ; psychophysical, 20, 40, 46, 48, 50, 78, 144 ff. ; mathematics the connecting link with the objectives of science, 59; minimum conditions to be satisfied for, 63 educational, 1 1 , 1 2 - 1 3 ; intel-

177 lectual antecedents, 14-39 ; designing of procedures and techniques, 1 4 ; expansion of physics, 1 4 - 2 1 ; dynamics of organic forms, 2 1 - 2 7 ; rests upon variations, 23 ; application and extension of mathematical statistics, 27-331 application of mathematics to, 33, 59 ; rise of precision psychology, 33-37; intellectual design sketched in late nineteenth century, 39 f. ; beginnings, 40-45 ; multiplicity of instruments the result of research in field of, 40; first formal course in, 4 1 ; first instruments, 45-48 ; criticisms of, 52-56; logical foundations, 57-80 ; as a search for structure, 57-61 ; use of numbers, 60, 7 1 » 75i 78 ; axiomatic conditions, 61-67; order, 6 1 ; addition, 64; interpretations of conditions, 677 5 ; types, 75-79; complexity of phenomena, 81 ; defining character of learning the primary· problem, 82; problem of determining results, based upon use of developmental instruments, 91 ; relation between performance and objects of measurement must be experimentally explored, 122 ; types, 125 ; claim that quantitative units have been established, refuted, 1 2 5 ; pattern of, similar to measurement of temperature, 127 ; general achievement, 129; outlook, 15965 ; tendency toward simplicity and abstractness, 160-62; toward integration and complexity, 162-

64 Measuring devices, medieval, 1 Measuring instruments, see Instruments Mechanics, expanded into physics, 1 4 ; methods and basic concepts extended, 15 ; and the inner life, 17, 20 Mechanistic psychology of learning, reliance upon, 53 Mental ability, normal distribution of, 30. 3 i

178

Index

Mental discipline as main purpose of study, 44 Mental processes, interest in study of, by quantitative means, 38 Mental tests and measurement, Cattell's study of, 37 Merit, of composition, 149, 152, 154; analysis of judgments of, 15055; nature of, 152, 154 Middle A g e s , scientific thought founded upon Aristotelian science, ι Mind, supernatural conceptions of, i l ; subject to natural laws, 25; mechanics of, Herbart's conception, 28 Mohs's scale of hardness, 70 Moivre, A b r a h a m de, 29 M o r a l qualities, normal distribution of, 30 Morphologists, study of ideal forms, 22 Morrison, H . C , 1 1 3 ; quoted, 117 Movement, extent and force of, 152, 153 Nagel, Ernest, quoted, 66 Natural forces, relation to transformation of species, 24 Naturalism, 27 Natural selection, 24 ff. ; mechanical implications of theory, 23 ; relation of organism to environment, 24; differences among members the first condition of, 28 Nature, basic unity, 1 6 ; new concept, 21 ; Newton's static concept, 21 ; method of establishing structure in, 61 ; reactions to responses of, n o ; mechanical forces, see Mechanics Needs of the self, 106 n, 107 Newton, Sir Isaac, 14; static concept of nature, 21 Newtonian mechanics, and mental processes, 25 ; mathematics its chief instrument of conquest, 27 N e w Y o r k City, tests used in school survey, 46 Non-symmetrical relation, 62

N o r m a l curve, 133 n, 136 ff.; unequal units on, 156-57 Normality, assumption of, 132 Normal law, 30 Numbers, use of, in educational measurement, 60, 71, 75, 78; introduced under experimental conditions, 157 Numerical scores as basis of prediction, 92

Objective qualities, measurement, 146 Observation, personal equation in, 29; errors of, 30, 146; variations of judgment treated as, 151 ff. Observations and principles, relationship, 86, 87 On the Perception of Small Differences (Fullerton and Cattell), 20«, 144, 146», 1 5 2 « Operations by which instruments are validated, 83 Order, basis of, 6 1 ; forms, 61-63; meaning, 62; exhibited in terms of some quality, 81 ; axioms experimentally confirmed, 125 ; experimentally satisfied, 125 f. ; see also Transitive asymmetrical order Organic compounds, fabrications of, 16 O r g a n i c forms, dynamics of, 21-27 Organic world, 1 6 ; a physico-chemical world, 17 Organism, relation to its environment, 24, 25 ; as a unity and as a multiplicity of factors, 160, 162 Organismic view of learning methodological implications, 163 Origin of Species ( D a r w i n ) , 21 Outcomes of learning, see Learning outcomes

P a v l o v , experiment on the salivary reflex, 100 Pearson, K a r l , 32 Pendulum clock, 1

Index Performance, determination of rate of, 51 ; resistance to qualitative analysis, 82 ; validity and, 96124 ; problem of discovering components, 96 n ; outer and inner aspects of behavior, 97-102; the self as a structure of attitudes, 102-9 > self and outcomes of learning, 109-19; implications for validity of instruments, 119-23; relation with objects of measurement must be experimentally explored, 122 ; addition and, 125-43 > s e e a ^ s o Addition Performance tests, 50 ; development, 50; rate, 51 Personal equation, 29; varies with nature of stimulus, 34 Personal life made possible by internalization, 101 Physical understanding, attitude of, no Physics, expansion of, 14-21 ; chemistry appears as a branch of, 15 ; reached limits with psychophysical method of Fechner, 18, 20, 24 Physiological type of behavior, 100, 101 Play, a stage in development of self, 105 Polarity, of validating instruments, 86-88; of all science, 87 Precision psychology, see Psychology Priestley, Joseph, 15 Principles and observations, relationship, 86, 87 Probability, mathematics of, 29 ; application of law to human traits and social phenomena, 30 Probability curve, normal, 136 fï. Progressives, conflict with conservatives, 43 ; opposition to measurement, 52 Property, minimum conditions for measurement, 63; measured in terms of itself, 126; derivation of units from function of, 126 Psychological laboratory, established by Wundt, 33 ; experi-

179

mental studies, 35 ; established by Cattell, 36 Psychology, experimental research in, 18; physics expanded into, 18, 20, 24; concerned with behavior, 26; study of reaction time one of chief problems, 33 ; study of personal variations of observations a problem of, 34 ; amassing of data, 34; established on scientific basis, 35, 38 ; influence upon techniques of measurement, 49; educational implications of developments in, 89, 90 American, Cattell's influence in shaping, 35 comparative, 26 precision, rise of, 14, 33-37; chief contribution, 38 Psychometrics, criticism of, 76 Psychophysical measurement, 40, 46, 48, 50, 78, 144 ff. ; Fechner's w o r k in, 18, 20, 24 Qualities, classification into two groups, 75, 79; objective, measurement, 146 Quality, determination of validity the procedure of establishing the presence of, 81 Quality scales, 20, 48, 50; theory, 144-50; addition and, 144-58; an analysis of judgments of merit, 150-55; assumptions underlying claim that units are equal, 150, 156; unequal units on the normal curve, 156-57 Quetelet, L . A . J., 22; application of normal probability law to human traits and social phenomena, 30; statistical methods of, appropriated by Galton, 31 Rate tests, 51 Rating scales, not classed as measuring instruments, 9 R a w scores, scaling of, 133 Reaction, reflex theory of, 36 Reaction time, 29; study of, one of chief problems of psychology, 33

ι8ο

Index

Reaction-time measurements, ultimate value of, 37 Reading tests, 51 Recall, relation between higher mental processes and, 119 Relations, the essence of mathematics, 60; meaning, 62 Relative position, reduction of scale of, 132 Research, fields of, basically physical, i s Responses, correct, taken as indices of different results, 97 ; overt, correlated with underlying events, 98, 120; below the level of e x perience, 100; having counterpart in experience, 102; gestures as, 104; skills attained through repetitive, 1 1 3 ; adaptive, 117 Rice, J. M., pioneers scientific study of education, 41-45; Forum, articles, 42, 43 ; tests, techniques, 44 ; scientific spirit, 45 Right-and-wrong cases, 145, 147,

ι ; mathematics an indispensable tool of, 27 ; connection between measurement and objectives of, 58 ; axioms of the calculus of, 61 ff., 75 ; method of, 87 ; polarity, 87; method not in question, 93 Scores, as basis of prediction, 92; scaling of raw, 133 Secondary-school tests and scales, number of, 48 Segments of qualitative world, 110 Self, the, as a structure of attitudes, 102-9; nature of, 105; stages in development, 105 ; social character, 106 ; change and growth, 107 ; outcomes of learning as part of, 109-19; defined, 110; affective side, 111 ; thought experienced outside of, 114 Sensations, relation to stimuli, 19; use of instruments of mathematics in study of, 28; fields of investigation, 33 ; measurement, 144 ff. Sense discrimination in touch and

152 Rôle of the other person, 105, 110 Ruch, G. M., and Talbott, Ε . O., quoted, 128 R u g g , Harold, 1 1 2 ; quoted, 53

sight, experiments on, 19 Series, structure of, 58 Significant symbols, 104 Skills, 110, 113 Social change, tendency to prevent adjustment of curriculum to, 54 Social process and individual conduct, 103 Social understanding, attitude of, 110, I I I Socio-economic status, scale for rating, 10 Spearman, C., 49, 160 Species, the unit in considering qu,ality of organism, 23 ; mechanical transformation, 25; psychology's concern with behavior a deduction f r o m idea of survival, 26

Salivary reflex experiment, iOO Scaled tests, see Developmental tests Scales, pattern for establishing transitive asymmetrical order in constructing, 68 ; used in deriving quantitative units, 133-35 ; steps in procedure of construction, 156; see also Quality scales ; also kinds of scales, e. g. H a n d w r i t i n g scales Scatter diagram, 32 School systems, Rice's studies of, 42; N e w Y o r k City survey, 46; autocratic and industrial-like scheme of administration, 54; improving effects by improving staff, 93 School tests and scales, number of, 48 Schwann, Theodor, 17 Science, Aristotelian and modern,

Speech, 104» Spelling scales, 44, 47, 49, 50 Spencer, Herbert, 21 Squares, method of least, 32 Standard of evaluation, measurement involves, 6 Statistical manipulations, 135

Index Statistics, application to study of living things, 14; mathematics of, extended into study of educational problems, 23 ; application and extension of mathematical, 27-33 Stellar transits, individual differences in recording times of, 29, 34 Stephens, J. M., 93 η Stimulation, measurement, 144 ff., ι S3 Stimuli, discrimination between, 19 ; relation of sensations, 19 Stimulus, gesture as, 104 Stimulus-response hypothesis, 26 Stone, C. W., arithmetical tests, 45, 46 Structural identity between distribution of variables and of errors, 31 Structure, measurement as a search for, 57-61 ; meaning of term, 57 ; similarity of, 58; see also Addition : Order Superintendence, Department of, Rice's tests challenged by, 44 Symbols, relation to mathematics, 60; significant, 104 Symmetrical order, 61 ; defined, 62 T-scale, 133 Talbott, E. O., and Ruch, G. M., quoted, 128 Teachers, improving effects of school, 93 Teachers College, Columbia University, Thorndike's courses in, 41 Techniques, selection of, 68 η Temperature, measurement of, 76, 77, 79, 126, 139 Tests and scales, see Instruments : Scales : also kinds of tests, e. g. Arithmetic tests Theodolite, 9 Thermometer, 79, 139 Thermoscope, 1 Thorndike, Edward Lee, handwriting scale, 20, 46, 47, 48, 50 ; drawing scales, 20; work on animal

181

intelligence, 26 ; quoted, 27, 48, 55, 118, 148, 150 n; encouraged to extend study to intelligence of human beings, 37; work of, 40 f., 45; students, 45, 47 Thorndike-McCall Reading Scale, 51 Thought experienced outside of self, 114 Thurstone, L. L., 148 η Time, early clocks for measuring, 1 Tolman, E. C., 104 η Trabue, M. R., language scales, 47, 50, 134 Traces, theory of, 131 Transitive asymmetrical order, 63, 81, 131, 132; defined, 63; qualitative continuity necessary to measurement, 67 ; procedures by which established, 68 ff. ; of compositions, 149-55 Transitive relation, 62 Transitive symmetrical relation, 63 True-false form of item, 49 Tyler, R. W., 119 Understanding, attitude of, 115 Uniformity, first quantitative, established in psychology, 19 United States, school systems, 42 Units, claim refuted, that quantitative units have been established, 125; difficulty of establishing equal, 126; quest for equal, 13135 ; assumptions underlying search for quantitative, 132; methods followed in deriving, 133 ; critique of the procedures of obtaining, 136-38; types, 138-42; arbitrary character, 140 ; conventional, 140, 156; equal, established by addition, 142 ; of quality scales, 148, 155 ; assumptions underlying claim of equal, 150, 156; unequal, on the normal curve, 156-57 Unity, tendency of self toward, 106, 108; of the organism, 160, 162 Urea, prepared out of inorganic materials, 16

i82

Index

Values, accepted by process of internalization, 112 V a n W a g e n e n A m e r i c a n History Scale, 51 Variations, measurement of, 4 ; mistakes of nature, 22; quantitative theory implicit in Darwin's conception of, 23 ; significance of conception, 2 3 ; of organic forms, law of errors applicable to, 31 ; personal, see Personal equation V o c a l gestures, 104, 105, 109 W a t e r , decomposition of, 16 W a t e r clock, 1 W e b e r , Ε . H., 19, 33, 46, 144 Wechsler, David, 132 η

W e i g h t , relation between height and, 31 Wheatstone bridge, 8 W h e e l clock, 1 Whitehead, Τ . N „ 8 W ö h l e r , Friedrich, 16 Wollaston, W i l l i a m Hyde, 16 W o o d y , Clifford, arithmetic scales, 47. 50, 134, 139 Writing, overt responses during, 98; as experience, 99 Wundt, Wilhelm, psychological laboratory, 33 ; founded psychology on sound scientific basis, 35 Zero point, 77

Vita BUNNIE OTHANEL

SMITH was born M a y

29,

1903, at Clarksville, Calhoun County, Florida. He attended the public schools of Florida and the University of Florida, receiving the degree of Bachelor of Science in Education from the last named institution in 1925. In 1932 he received the degree of Master of Arts from Teachers College, Columbia University. During the academic years 1933-34 and 1936-37 he was a graduate student at Teachers College.